Deep Learning CA1¶

Name: Law Wei Tin
Class: DAAA/FT/2A/02
Admin number: 2415761

Background Research¶

Class Characteristics¶




  • Bean

General Description:
The images display beans with an elongated and thin shape. Their textures are smooth, and they are generally green.

Potential Challenges:
This class might be similar to Bottle Gourd, or Cucumbers, due to their similar shape. Our model might misclassify the classes.




  • Bitter Gourd

General Description:
They are elongated and bumpy with a rough texture. Their colors are green, and resemble beans somewhat.

Potential Challenges:
This bumpy and rough texture might be lost at low resolution. Similarly, the model might mistake this class for a Bean or Cucumber.




  • Brinjal

General Description:
Round, oval shape with a smooth and shiny texture. They generally have a purple body and a green head.

Potential Challenges:
The brinjals are easily distinguishable if color is retained. However, it is not due to assignment specifications.




  • Cabbage

General Description:
The cabbages have a round and leafy shape. They have a greenish white color, and a layered, leafy texture.

Potential Challenges:
At low resolutions, this class might represent a cauliflower due to their similar shape.




  • Capsicum

General Description:
They are generally a bit blocky and glossy. They have a smooth texture, and have a relatively wide variety of color. (red, green, yellow)

Potential Challenges:
Color is not retained due to assignment specifications, making the capsicum class lose one of its most distinguishable properties.




  • Cauliflower and Broccoli

General Description:
These two vegetables have a floret-shaped structure, and a granular texture. They are white or green in color.

Potential Challenges:
This class generally doesn't have any downsides, due to their distinct structure. Although it is a question whether their structure can be maintained at low resolutions.




  • Cucumber and Bottle_Gourd

General Description:
They are elongated with a smooth texture. Both are green in color.

Potential Challenges:
Similar to Beans, which might pose a problem for our model.




  • Potato

General Description:
Potatoes have a circular/oval shape, with a little bit of roughness in their texture. Potatos have a brownish yellow color.

Potential Challenges:
Might be easily confusable with other circular vegetables/fruits, such as Pumpkin or Tomato, especially at low resolutions.




  • Pumpkin

General Description:
Pumpkins are large and round, have a ribbed texture, and have a orange body and a green tip.

Potential Challenges:
Easily confusable with Potatos or Tomatos due to structure similarity, especially at low resolutions.




  • Radish and Carrot

General Description:
These two have a distinct structure, being tapered and pointy. Having a white/orange/red color, they also have a smooth texture.

Potential Challenges:
This class generally has no downsides, other than their most distinct structure potentially being lost at low resolutions.




  • Tomato

General Description:
Tomatoes are round, glossy and smooth. They are also red in color.

Potential Challenges:
Easily confusable with Potatos or Pumpkins due to structure similarity, especially at low resolutions.




Dataset implications:¶


After minor analysis on the dataset, we can observe that certain things are out of place. For example, the class labels for the train, validation and testing are different across folders. We will delve deeper into this later.

Furthermore, closely looking into the folders provided in the train dataset, we can view images in wrong classes. For example, there is approximately 11 carrots in the 'Bean' folder in the train dataset. This can cause implications for our model and incorrect insights. Therefore, we will handle this during our exploratory data analysis.




CNN Architectures:¶


Since this assignment focuses on convolutional neural networks (CNNs), we began by exploring some of the most influential and widely used CNN architectures in the field. Each of the models below brings different strengths to the table, and by implementing and comparing them on our vegetable-fruit dataset, we not only gain a deeper understanding of how architectural choices affect performance, but also build a robust classifier tailored to our problem.


  • Custom CNN

Why? Serves as our lightweight, task-specific baseline. By hand-crafting the number and size of convolutional blocks, dropout rates, and classifier head, we can directly observe how each design decision impacts accuracy on small (23x23) grayscale images.


  • VGG-Style CNN

Why? Emulates the simplicity and depth of the classic VGG family—stacked 3x3 convolutions with pooling—that's known for very clean feature hierarchies. Although originally designed for 224x224 RGB inputs, a "mini-VGG" adapted to our 23x23 and 101x101 grayscale inputs shows how deeper, uniform layers can improve representational power.


  • ResNet-50 (Mini-ResNet Variant)

Why? Introduces residual connections that help gradient flow through very deep networks, preventing vanishing gradients. Even a slimmed-down "mini-ResNet" demonstrates how identity shortcuts let us stack more layers without performance degradation—critical when exploring depth vs. input resolution trade-offs.


  • DenseNet-Style CNN

Why? Uses dense connectivity, where each layer receives the outputs of all previous layers. This encourages maximal feature reuse and reduces overall parameter count. On small-input datasets, DenseNets often generalize well, making them a valuable counterpoint to both plain and residual CNNs.


  • MobileNet-Lite

Why? Employs depthwise separable convolutions to dramatically cut computation and model size while retaining accuracy. This is especially important for low-resolution inputs or edge-device deployment. Comparing MobileNet-Lite with our heavier models illustrates the trade-off between efficiency and representational capacity.

pip installing necessary libraries¶

  • visualkeras: Used to display our model architecture in a very visually appealing way.

  • keras-tuner: Used to hypertune our model.

In [ ]:
# pip install visualkeras
In [ ]:
# pip install keras-tuner

Importing of libraries¶

In [ ]:
# OS module for accessing the images
import os

# TensorFlow for building and training deep learning models
import tensorflow as tf
# Pandas for data manipulation and analysis (e.g., reading CSV files, handling dataframes)
import pandas as pd

# NumPy for numerical operations, especially arrays and matrices
import numpy as np

# Matplotlib for plotting and visualizing data and training metrics
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

# Random module for generating random numbers, useful for reproducibility or data augmentation
import random
import math

# OpenCV for image processing tasks (e.g., reading, transforming images)
import cv2

# Shutil for file operations such as copying, moving, and deleting files and directories
import shutil

# PIL (Python Imaging Library) for image manipulation like resizing, enhancing, or converting
from PIL import Image, ImageEnhance, ImageOps, ImageFont

# Libraries used for removing duplicates
import hashlib
from collections import defaultdict

# Keras Tuner: tools for hyperparameter tuning using Random Search and HyperParameters
from keras_tuner import HyperParameters, RandomSearch

# Scikit-learn metrics for evaluating model performance: confusion matrix, visualization
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, classification_report

# Visualizing Model Architecture
import visualkeras
In [ ]:
# Mounting drive to access folders inside
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive

Importing of dataset¶

Here we will use a command in order to unzip the dataset folder.

We uploaded a zipped folder because uploading a unzipped folder will take too long. Unzipping in colab takes an approximate 10 seconds.

In [ ]:
# Command for unzipping zipped folder
!unzip -q /content/drive/MyDrive/Datasets/image_dataset.zip
In [ ]:
dataset_path = "/content/Dataset for CA1 part A - AY2526S1/train"

Exploratory Data Analysis¶

We will begin by conducting an exploratory data analysis of the data, to gain a better understanding of the characteristics of the dataset.

In [ ]:
# Show all available classes in train set, and we establish this as the main class folder names
class_names = sorted(os.listdir("/content/Dataset for CA1 part A - AY2526S1/train"))
print(class_names)
['Bean', 'Bitter_Gourd', 'Brinjal', 'Cabbage', 'Capsicum', 'Cauliflower and Broccoli', 'Cucumber and Bottle_Gourd', 'Potato', 'Pumpkin', 'Radish and Carrot', 'Tomato']
In [ ]:
# Viewing all folder names in each set, to checking if there are any differences (which there are)
train_classes = sorted(os.listdir("/content/Dataset for CA1 part A - AY2526S1/train"))
val_classes = sorted(os.listdir("/content/Dataset for CA1 part A - AY2526S1/validation"))
test_classes = sorted(os.listdir("/content/Dataset for CA1 part A - AY2526S1/test"))

print("Train classes:", train_classes)
print("Validation classes:", val_classes)
print("Test classes:", test_classes)
Train classes: ['Bean', 'Bitter_Gourd', 'Brinjal', 'Cabbage', 'Capsicum', 'Cauliflower and Broccoli', 'Cucumber and Bottle_Gourd', 'Potato', 'Pumpkin', 'Radish and Carrot', 'Tomato']
Validation classes: ['Bean', 'Bitter_Gourd', 'Brinjal', 'Cabbage', 'Capsicum', 'Cauliflower with Broccoli', 'Cucumber with Bottle_Gourd', 'Potato', 'Pumpkin', 'Radish with Carrot', 'Tomato']
Test classes: ['Bean', 'Bitter_Gourd', 'Bottle_Gourd and Cucumber', 'Brinjal', 'Broccoli and Cauliflower', 'Cabbage', 'Capsicum (apparently)', 'Carrot and Radish', 'Potato', 'Pumpkin (purportedly)', 'Tomato (ostensibly)']

We notice that some of the names of the folders inside the train class and the validation class is different. For example, the names of 'Cucumber and Bottle_Gourd' inside the train dataset and 'Cucumber with Bottle_Gourd' inside the validation dataset. This can result in wrong model evaluations.

We also observe wrong names in the testing dataset as well.

In order to analyze this issue more in depth, we will print out the images of each set, the train, validation and test.

Data Visualization¶

Firstly, we take a look at batches of the dataset

In [ ]:
# Number of images to show
# Grid settings: 4 columns, compute rows as needed
num_classes = 11
ncols = 4
nrows = math.ceil(num_classes / ncols)

Training dataset¶

We will observe the coloured version and the grayscale version

In [ ]:
plt.figure(figsize=(ncols * 3, nrows * 3))

for idx, cls in enumerate(class_names):
    class_path = os.path.join(dataset_path, cls)
    # pick one random image from this class
    img_name = random.choice(os.listdir(class_path))
    img_path = os.path.join(class_path, img_name)

    # read & convert to RGB
    image = cv2.imread(img_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    ax = plt.subplot(nrows, ncols, idx + 1)
    ax.imshow(image)
    ax.set_title(cls)
    ax.axis('off')

# If there are any empty subplots (e.g. 12th slot), turn off their axes:
for j in range(idx + 2, nrows * ncols + 1):
    plt.subplot(nrows, ncols, j).axis('off')

plt.tight_layout()
plt.show()
No description has been provided for this image

Next, we show the grayscaled version of the train set.

In [ ]:
plt.figure(figsize=(ncols * 3, nrows * 3))

for idx, cls in enumerate(class_names):
    class_path = os.path.join(dataset_path, cls)
    # pick one random image from this class
    img_name = random.choice(os.listdir(class_path))
    img_path = os.path.join(class_path, img_name)

    # read & convert to grayscale
    image = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)

    ax = plt.subplot(nrows, ncols, idx + 1)
    ax.imshow(image, cmap='gray')
    ax.set_title(cls)
    ax.axis('off')

# If there are any empty subplots (e.g. 12th slot), turn off their axes:
for j in range(idx + 2, nrows * ncols + 1):
    plt.subplot(nrows, ncols, j).axis('off')

plt.tight_layout()
plt.show()
No description has been provided for this image

Validation dataset¶

In [ ]:
plt.figure(figsize=(ncols * 3, nrows * 3))

for idx, cls in enumerate(val_classes):
    class_path = os.path.join("/content/Dataset for CA1 part A - AY2526S1/validation", cls)
    # pick one random image from this class
    img_name = random.choice(os.listdir(class_path))
    img_path = os.path.join(class_path, img_name)

    # read & convert to RGB
    image = cv2.imread(img_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    ax = plt.subplot(nrows, ncols, idx + 1)
    ax.imshow(image)
    ax.set_title(cls)
    ax.axis('off')

# If there are any empty subplots (e.g. 12th slot), turn off their axes:
for j in range(idx + 2, nrows * ncols + 1):
    plt.subplot(nrows, ncols, j).axis('off')

plt.tight_layout()
plt.show()
No description has been provided for this image

Next, we show the grayscaled version of the validation set.

In [ ]:
plt.figure(figsize=(ncols * 3, nrows * 3))

for idx, cls in enumerate(val_classes):
    class_path = os.path.join("/content/Dataset for CA1 part A - AY2526S1/validation", cls)
    # pick one random image from this class
    img_name = random.choice(os.listdir(class_path))
    img_path = os.path.join(class_path, img_name)

    # read & convert to RGB
    image = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)

    ax = plt.subplot(nrows, ncols, idx + 1)
    ax.imshow(image, cmap='gray')
    ax.set_title(cls)
    ax.axis('off')

# If there are any empty subplots (e.g. 12th slot), turn off their axes:
for j in range(idx + 2, nrows * ncols + 1):
    plt.subplot(nrows, ncols, j).axis('off')

plt.tight_layout()
plt.show()
No description has been provided for this image

Test dataset¶

In [ ]:
plt.figure(figsize=(ncols * 3, nrows * 3))

for idx, cls in enumerate(test_classes):
    class_path = os.path.join("/content/Dataset for CA1 part A - AY2526S1/test", cls)
    # pick one random image from this class
    img_name = random.choice(os.listdir(class_path))
    img_path = os.path.join(class_path, img_name)

    # read & convert to RGB
    image = cv2.imread(img_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    ax = plt.subplot(nrows, ncols, idx + 1)
    ax.imshow(image)
    ax.set_title(cls)
    ax.axis('off')

# If there are any empty subplots (e.g. 12th slot), turn off their axes:
for j in range(idx + 2, nrows * ncols + 1):
    plt.subplot(nrows, ncols, j).axis('off')

plt.tight_layout()
plt.show()
No description has been provided for this image

We can already observe something wrong here. Tomatos are labeled as pumpkins, and pumpkins are labeled as tomatos. There are also wrong folder names. We will discuss this later. Next, we show the grayscaled version of the test set.

In [ ]:
plt.figure(figsize=(ncols * 3, nrows * 3))

for idx, cls in enumerate(test_classes):
    class_path = os.path.join("/content/Dataset for CA1 part A - AY2526S1/test", cls)
    # pick one random image from this class
    img_name = random.choice(os.listdir(class_path))
    img_path = os.path.join(class_path, img_name)

    # read & convert to grayscale
    image = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)

    ax = plt.subplot(nrows, ncols, idx + 1)
    ax.imshow(image, cmap='gray')
    ax.set_title(cls)
    ax.axis('off')

# If there are any empty subplots (e.g. 12th slot), turn off their axes:
for j in range(idx + 2, nrows * ncols + 1):
    plt.subplot(nrows, ncols, j).axis('off')

plt.tight_layout()
plt.show()
No description has been provided for this image

We notice something terribly wrong in the test set.




For example, the class name: Pumpkin (purportedly) contains images of tomatos. And on the contrary, the class name Tomato (ostensibly) contains images of pumpkins.

We will need to swap the names of these two classes in order to adhere to proper convention.




Another example would be Capsicum (apparently). There shouldn't be an (apparently), because it is infact full of capsicum images.

There are also the issues that we mentioned earlier.




We will need to change this in order to prevent our models from classifying wrongly, and to adhere to proper naming convention.

Correcting errors in folder names¶

We will be using commands in order to rename the folders.

In [ ]:
!mv /content/'Dataset for CA1 part A - AY2526S1'/test/"Pumpkin (purportedly)" /content/'Dataset for CA1 part A - AY2526S1'/test/"Tomato"
!mv /content/'Dataset for CA1 part A - AY2526S1'/test/"Tomato (ostensibly)" /content/'Dataset for CA1 part A - AY2526S1'/test/"Pumpkin"
!mv /content/'Dataset for CA1 part A - AY2526S1'/test/"Capsicum (apparently)" /content/'Dataset for CA1 part A - AY2526S1'/test/"Capsicum"
!mv /content/'Dataset for CA1 part A - AY2526S1'/test/"Bottle_Gourd and Cucumber" /content/'Dataset for CA1 part A - AY2526S1'/test/"Cucumber and Bottle_Gourd"
!mv /content/'Dataset for CA1 part A - AY2526S1'/test/"Broccoli and Cauliflower" /content/'Dataset for CA1 part A - AY2526S1'/test/"Cauliflower and Broccoli"
!mv /content/'Dataset for CA1 part A - AY2526S1'/test/"Carrot and Radish" /content/'Dataset for CA1 part A - AY2526S1'/test/"Radish and Carrot"

!mv /content/'Dataset for CA1 part A - AY2526S1'/validation/"Cucumber with Bottle_Gourd" /content/'Dataset for CA1 part A - AY2526S1'/validation/"Cucumber and Bottle_Gourd"
!mv /content/'Dataset for CA1 part A - AY2526S1'/validation/"Cauliflower with Broccoli" /content/'Dataset for CA1 part A - AY2526S1'/validation/"Cauliflower and Broccoli"
!mv /content/'Dataset for CA1 part A - AY2526S1'/validation/"Radish with Carrot" /content/'Dataset for CA1 part A - AY2526S1'/validation/"Radish and Carrot"

Checking for correct folder names:

In [ ]:
train_classes = sorted(os.listdir("/content/Dataset for CA1 part A - AY2526S1/train"))
val_classes = sorted(os.listdir("/content/Dataset for CA1 part A - AY2526S1/validation"))
test_classes = sorted(os.listdir("/content/Dataset for CA1 part A - AY2526S1/test"))

print("Train classes:", train_classes)
print("Validation classes:", val_classes)
print("Test classes:", test_classes)
Train classes: ['Bean', 'Bitter_Gourd', 'Brinjal', 'Cabbage', 'Capsicum', 'Cauliflower and Broccoli', 'Cucumber and Bottle_Gourd', 'Potato', 'Pumpkin', 'Radish and Carrot', 'Tomato']
Validation classes: ['Bean', 'Bitter_Gourd', 'Brinjal', 'Cabbage', 'Capsicum', 'Cauliflower and Broccoli', 'Cucumber and Bottle_Gourd', 'Potato', 'Pumpkin', 'Radish and Carrot', 'Tomato']
Test classes: ['Bean', 'Bitter_Gourd', 'Brinjal', 'Cabbage', 'Capsicum', 'Cauliflower and Broccoli', 'Cucumber and Bottle_Gourd', 'Potato', 'Pumpkin', 'Radish and Carrot', 'Tomato']

Addressing the major issues in the train dataset¶

Earlier, we mentioned that we observed that there are Carrots in the Bean dataset. More accurately, there are 11 carrots in the bean dataset, which can mess up our model training.

We want to address this now, in order to prevent any implications caused when addressing stuff like class imbalance.

In [ ]:
# Removing carrots in bean folder
# Directory containing the images
bean_dir = '/content/Dataset for CA1 part A - AY2526S1/train/Bean'

# List of filenames (without .jpg extension) to remove
filenames_to_remove = ['0001', '0002', '0003', '0004',
                       '0017', '0018', '0019', '0020',
                       '0033', '0049', '0050']

# Remove each specified file
for name in filenames_to_remove:
    file_path = os.path.join(bean_dir, f'{name}.jpg')
    if os.path.exists(file_path):
        os.remove(file_path)
        print(f"Deleted: {file_path}")
    else:
        print(f"File not found: {file_path}")
Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0001.jpg
Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0002.jpg
Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0003.jpg
Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0004.jpg
Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0017.jpg
Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0018.jpg
Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0019.jpg
Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0020.jpg
Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0033.jpg
Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0049.jpg
Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0050.jpg

Viewing Existing Duplicates¶

In [ ]:
# Step 1: Helper to compute MD5 hash
def file_hash(filepath):
    hasher = hashlib.md5()
    with open(filepath, 'rb') as f:
        buf = f.read()
        hasher.update(buf)
    return hasher.hexdigest()

# Step 2: Scan and collect duplicates
root_dir = '/content/Dataset for CA1 part A - AY2526S1/train'
hash_dict = defaultdict(list)

for folder in os.listdir(root_dir):
    folder_path = os.path.join(root_dir, folder)
    if os.path.isdir(folder_path):
        for file in os.listdir(folder_path):
            if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                path = os.path.join(folder_path, file)
                hash_val = file_hash(path)
                hash_dict[hash_val].append(path)

# Step 3: Display first duplicate pair (if any)
displayed = False
for files in hash_dict.values():
    if len(files) > 1:
        img1 = Image.open(files[0])
        img2 = Image.open(files[1])

        plt.figure(figsize=(15, 4))
        plt.subplot(1, 2, 1)
        plt.imshow(img1)
        plt.title(f'Duplicate 1:\n{files[0]}')
        plt.axis('off')

        plt.subplot(1, 2, 2)
        plt.imshow(img2)
        plt.title(f'Duplicate 2:\n{files[1]}')
        plt.axis('off')

        plt.show()
        displayed = True
        break

if not displayed:
    print("No duplicates found.")
No description has been provided for this image
In [ ]:
# Remove all duplicate files (keep the first file in each group)
duplicates_removed = 0

for files in hash_dict.values():
    if len(files) > 1:
        for duplicate_path in files[1:]:  # skip first, delete the rest
            os.remove(duplicate_path)
            print(f"Deleted duplicate: {duplicate_path}")
            duplicates_removed += 1

print(f"\nTotal duplicates removed: {duplicates_removed}")
Deleted duplicate: /content/Dataset for CA1 part A - AY2526S1/train/Tomato/0602.jpg
Deleted duplicate: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0028.jpg
Deleted duplicate: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0026 - Copy.jpg
Deleted duplicate: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0029.jpg
Deleted duplicate: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0030 - Copy.jpg
Deleted duplicate: /content/Dataset for CA1 part A - AY2526S1/train/Cabbage/0438.jpg

Total duplicates removed: 6

Observing the difference between the different pixel sizes¶

We will show a picture of the original image, a 23x23 image and a 101x101 image.

We wil display a colored version for easier interpretability.

In [ ]:
# Pick a random class
random_class = random.choice(os.listdir(dataset_path))
class_path = os.path.join(dataset_path, random_class)

# Pick a random image from the class
random_image = random.choice(os.listdir(class_path))
image_path = os.path.join(class_path, random_image)

# Read the original image (color)
image = cv2.imread(image_path)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Resize to 23x23 and 101x101
small_image = cv2.resize(image_rgb, (23, 23))
large_image = cv2.resize(image_rgb, (101, 101))

# Plot side by side
plt.figure(figsize=(15, 8))

# Original
plt.subplot(1, 3, 1)
plt.imshow(image_rgb)
plt.title('Original')
plt.axis('off')

# 23x23
plt.subplot(1, 3, 2)
plt.imshow(small_image)
plt.title('23x23')
plt.axis('off')

# 101x101
plt.subplot(1, 3, 3)
plt.imshow(large_image)
plt.title('101x101')
plt.axis('off')

plt.tight_layout()
plt.show()
No description has been provided for this image

Class Distribution¶

When training a machine learning model, it is always important to check the distribution of the different classes in the dataset. We can check if any class balancing is needed.

If classes are imbalanced, our model might perform well on one class but perform poorly on another class.

In [ ]:
class_names = []
n_images_per_class = []

for class_name in os.listdir(dataset_path):
    class_path = os.path.join(dataset_path, class_name)
    if os.path.isdir(class_path):
        n_images = len([f for f in os.listdir(class_path)])
        class_names.append(class_name)
        n_images_per_class.append(n_images)
        print(f"{class_name:15s}: {n_images} images")
Capsicum       : 351 images
Tomato         : 954 images
Bitter_Gourd   : 720 images
Pumpkin        : 814 images
Bean           : 780 images
Brinjal        : 868 images
Cabbage        : 502 images
Cucumber and Bottle_Gourd: 875 images
Radish and Carrot: 504 images
Potato         : 377 images
Cauliflower and Broccoli: 948 images
In [ ]:
# Plotting the bar chart
plt.figure(figsize=(10, 6))
plt.barh(class_names, n_images_per_class, color='skyblue')
plt.xlabel('Number of Images')
plt.ylabel('Class Name')
plt.title('Number of Images per Class')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()  # To make sure everything fits
plt.show()
No description has been provided for this image

We observe that some classes contain significantly more data than others, which may cause our model to perform worse on classes with less data.




To address this, we will add more images to the underrepresented classes, ensuring that each class has the same number of images—954, which is the amount in the class with the most data prior to balancing.




How will we achieve this?¶

To achieve this, we will replicate some images from the dataset and apply augmentation techniques to generate new variations of these images. This will help increase the dataset size while reducing the risk of overfitting by introducing more diversity in the data.

In [ ]:
# Paths
balanced_dataset_path = '/content/dataset/balanced_train'     # new folder

# Create balanced folder
os.makedirs(balanced_dataset_path, exist_ok=True)

max_count = max(n_images_per_class)
print(f"Largest class has {max_count} images.")

# Simple augmentation function
def augment_image(img_path):
    img = Image.open(img_path)

    # Random horizontal flip
    if random.random() > 0.5:
        img = ImageOps.mirror(img)

    # Random slight rotation
    angle = random.uniform(-15, 15)
    img = img.rotate(angle)

    # Random brightness adjustment
    enhancer = ImageEnhance.Brightness(img)
    img = enhancer.enhance(random.uniform(0.8, 1.2))

    return img

# For each class
for cls in os.listdir(dataset_path):
    cls_folder = os.path.join(dataset_path, cls)

    # Ensure it's a directory
    if not os.path.isdir(cls_folder):
        continue

    images = os.listdir(cls_folder)
    current_count = len(images)
    print(f'{cls}: {current_count}')

    # Create class folder in balanced dataset
    new_cls_folder = os.path.join(balanced_dataset_path, cls)
    os.makedirs(new_cls_folder, exist_ok=True)

    # First copy all original images
    for img in images:
        src_path = os.path.join(cls_folder, img)
        dst_path = os.path.join(new_cls_folder, img)
        shutil.copyfile(src_path, dst_path)

    # Oversample if needed to match max_count
    if current_count < max_count:
        extra_needed = max_count - current_count
        extra_images = random.choices(images, k=extra_needed)
        print(f"Augmenting {cls} with {extra_needed} images.")

        for idx, img in enumerate(extra_images):
            src_path = os.path.join(cls_folder, img)
            new_img = augment_image(src_path)
            dst_path = os.path.join(new_cls_folder, f"aug_{idx}_{img}")
            new_img.save(dst_path)

print("Dataset oversampled and lightly augmented!")
Largest class has 954 images.
Capsicum: 351
Augmenting Capsicum with 603 images.
Tomato: 954
Bitter_Gourd: 720
Augmenting Bitter_Gourd with 234 images.
Pumpkin: 814
Augmenting Pumpkin with 140 images.
Bean: 780
Augmenting Bean with 174 images.
Brinjal: 868
Augmenting Brinjal with 86 images.
Cabbage: 502
Augmenting Cabbage with 452 images.
Cucumber and Bottle_Gourd: 875
Augmenting Cucumber and Bottle_Gourd with 79 images.
Radish and Carrot: 504
Augmenting Radish and Carrot with 450 images.
Potato: 377
Augmenting Potato with 577 images.
Cauliflower and Broccoli: 948
Augmenting Cauliflower and Broccoli with 6 images.
Dataset oversampled and lightly augmented!
In [ ]:
balanced_class_names = []
balanced_n_images_per_class = []

for class_name in os.listdir(balanced_dataset_path):
    class_path = os.path.join(balanced_dataset_path, class_name)
    if os.path.isdir(class_path):
        n_images = len([f for f in os.listdir(class_path)])
        balanced_class_names.append(class_name)
        balanced_n_images_per_class.append(n_images)
        print(f"{class_name:15s}: {n_images} images")
Capsicum       : 954 images
Tomato         : 954 images
Bitter_Gourd   : 954 images
Pumpkin        : 954 images
Bean           : 954 images
Brinjal        : 954 images
Cabbage        : 954 images
Cucumber and Bottle_Gourd: 954 images
Radish and Carrot: 954 images
Potato         : 954 images
Cauliflower and Broccoli: 954 images

Average Image of Grayscaled Dataset¶

(note: showed the average image of the original dataset, not our balanced dataset. therefore, this is the original average image)

In [ ]:
# Initialize variables
sum_image = None
count = 0
expected_size = (224, 224)  # Height, width

for cls_name in os.listdir(dataset_path):
    cls_path = os.path.join(dataset_path, cls_name)

    if os.path.isdir(cls_path):
        for img_name in os.listdir(cls_path):
            img_path = os.path.join(cls_path, img_name)
            try:
                # Open and convert image to grayscale
                img = Image.open(img_path).convert("L")  # "L" mode = 8-bit grayscale

                # Resize if needed
                if img.size != (expected_size[1], expected_size[0]):  # PIL uses (width, height)
                    print(f"Skipping {img_path} due to wrong size: {img.size}")
                    continue

                img_array = np.array(img).astype(np.float32)

                if sum_image is None:
                    sum_image = img_array
                else:
                    sum_image += img_array
                count += 1
            except Exception as e:
                print(f"Error processing {img_path}: {e}")

# Calculate average
average_image = sum_image / count
average_image = np.clip(average_image, 0, 255).astype(np.uint8)

# Display the grayscale average image
plt.figure(figsize=(6, 6))
plt.imshow(average_image, cmap='gray')
plt.title('Average Grayscale Image of Dataset')
plt.axis('off')
plt.show()
Skipping /content/Dataset for CA1 part A - AY2526S1/train/Bitter_Gourd/0526.jpg due to wrong size: (224, 205)
Skipping /content/Dataset for CA1 part A - AY2526S1/train/Bitter_Gourd/0609.jpg due to wrong size: (224, 200)
No description has been provided for this image

This is the average image of the dataset. Although it is not clear, we can see that the average color is green.

We also notice that there are some images that are of wrong size, and we will handle that during the preprocessing part. Right now, we are just exploring the dataset. This issue can simply be resolved by importing our images by a specific size.

In [ ]:
expected_size = (224, 224)
class_averages = {}

# Loop through each class
for cls_name in os.listdir(dataset_path):
    cls_path = os.path.join(dataset_path, cls_name)

    if os.path.isdir(cls_path):
        images = []
        for img_name in os.listdir(cls_path):
            img_path = os.path.join(cls_path, img_name)
            try:
                img = Image.open(img_path)
                img_array = np.array(img).astype(np.float32)

                # Skip images of wrong size
                if img_array.shape[0:2] != expected_size:
                    continue

                images.append(img_array)
            except Exception as e:
                print(f"Error processing {img_path}: {e}")

        # Only if there are valid images
        if images:
            images = np.stack(images, axis=0)  # Shape: (num_images, height, width, channels)
            avg_image = np.mean(images, axis=0)

            # Convert the average image to grayscale
            gray_avg_image = np.mean(avg_image, axis=2).astype(np.uint8)

            class_averages[cls_name] = gray_avg_image

# Plot the grayscale average image for each class
plt.figure(figsize=(15, 10))

for idx, (cls_name, gray_avg_image) in enumerate(class_averages.items()):
    plt.subplot(3, 4, idx + 1)  # Adjust depending on how many classes you have
    gray_avg_normalized = gray_avg_image / 255.0  # Normalize for visualization
    plt.imshow(gray_avg_normalized, cmap='gray')  # Display in grayscale using cmap='gray'
    plt.title(cls_name)
    plt.axis('off')

plt.tight_layout()
plt.show()
No description has been provided for this image

From the following, we can observe:




Center Concentration¶

Most classes (e.g., Tomato, Pumpkin, Cabbage, Cauliflower and Broccoli, Potato) have their brightest or darkest intensities at the center, suggesting that the objects are centered in the images across the dataset.

This is typical in datasets where images are preprocessed to center the main object which is good for CNN training.




Circular Shapes¶

Some classes like Potato, Tomato, Bean show dark central blobs with fading outer regions, indicating a rounded, central shape.

This suggests our model could learn to distinguish these classes based on circularity.




Textures¶

Bitter_Gourd, Brinjal, and Radish and Carrot have more textured, noisy, or irregular patterns, possibly due to variation in object shapes or positioning.

These classes may have higher intra-class variation, which might make them harder to classify which can be useful insights for interpreting confusion matrices later.

Data Preprocessing¶

The following code loads both small and large image datasets for training and validation, ensuring that the images are preprocessed into batches, resized, shuffled, and ready for model input.

We also confirm the final shape of the images to verify the resizing process.

In [ ]:
small_train = tf.keras.preprocessing.image_dataset_from_directory(
    "/content/dataset/balanced_train",
    color_mode="grayscale",
    batch_size=32,
    image_size=(23,23), # strictly specified the sizes of the images
    shuffle=True,
    seed=123
)

small_val = tf.keras.preprocessing.image_dataset_from_directory(
    "/content/Dataset for CA1 part A - AY2526S1/validation",
    color_mode="grayscale",
    batch_size=32,
    image_size=(23,23), # strictly specified the sizes of the images
    shuffle=True,
    seed=123
)

large_train = tf.keras.preprocessing.image_dataset_from_directory(
    "/content/dataset/balanced_train",
    color_mode="grayscale",
    batch_size=32,
    image_size=(101,101), # strictly specified the sizes of the images
    shuffle=True,
    seed=123
)

large_val = tf.keras.preprocessing.image_dataset_from_directory(
    "/content/Dataset for CA1 part A - AY2526S1/validation",
    color_mode="grayscale",
    batch_size=32,
    image_size=(101,101), # strictly specified the sizes of the images
    shuffle=True,
    seed=123
)

for batch in small_train.take(1):
    imgs, labels = batch
    print("small:", imgs.shape)

for batch in large_train.take(1):
    imgs, labels = batch
    print("large:", imgs.shape)
Found 10494 files belonging to 11 classes.
Found 2200 files belonging to 11 classes.
Found 10494 files belonging to 11 classes.
Found 2200 files belonging to 11 classes.
small: (32, 23, 23, 1)
large: (32, 101, 101, 1)

Normalization¶

In the original images, pixel values typically range from 0 to 255. By normalizing the pixel values by 255.0, we ensure that the input data is standardized, which helps the neural network learn more effectively.

Why Normalize?¶

Neural networks generally perform better and converge faster when input values are within a smaller, consistent range. This process also helps prevent issues with large gradients and makes the training process more stable.

In [ ]:
def normalize_img(image, label):
    return image / 255.0, label

# Apply normalization
small_train = small_train.map(normalize_img)
small_val = small_val.map(normalize_img)

large_train = large_train.map(normalize_img)
large_val = large_val.map(normalize_img)

Data augmentation¶

We will have 2 sets of each training sets (23x23 and 101x101, and each with augmentation and no augmentation). Our models will be trained on both augmented and non-augmented data, and see if the augmentation actually provides an improvement in performance. If not, we will continue with the original non-augmented sets.

Our augmentation consists of applying flips, rotations, zooms and make the image brighter or darker randomly.




Why data augmentation?¶

Data augmentation can potentially:

  • Improve generalization. By augmenting our dataset, we are artificially increasing its size and diversity. This means the model will learn to recognize features in many different contexts.

  • Prevent overfitting. When we have a small dataset, our model can memorize the specific images in the training set. Data augmentation combats overfitting by providing new versions of the same images, so the model sees a broader range of variations and cannot just memorize pixel values.




In summary, data augmentation is a powerful technique to make your model more robust, reduce overfitting, and improve generalization by artificially expanding our dataset with transformations. Therefore we will be trying out augmentation!

In [ ]:
data_augmentation = tf.keras.Sequential([
    tf.keras.layers.RandomFlip("horizontal"),
    tf.keras.layers.RandomRotation(0.1),
    tf.keras.layers.RandomZoom(0.1),
    tf.keras.layers.RandomContrast(0.1),
])

We will visualize an example of what our augmentation does.

In [ ]:
# Function to display original + augmented samples
def visualize_augmentation(dataset, title, img_size):
    for images, labels in dataset.take(1):
        sample_image = images[0]
        break

    sample_image_batch = tf.expand_dims(sample_image, 0)
    augmented_images = [data_augmentation(sample_image_batch)[0] for _ in range(5)]

    plt.figure(figsize=(12, 3))
    plt.suptitle(f"{title} - {img_size}x{img_size}", fontsize=14)

    plt.subplot(1, 6, 1)
    plt.imshow(sample_image.numpy().squeeze(), cmap="gray")
    plt.title("Original")
    plt.axis("off")

    for i, aug_img in enumerate(augmented_images):
        plt.subplot(1, 6, i+2)
        plt.imshow(aug_img.numpy().squeeze(), cmap="gray")
        plt.title(f"Aug {i+1}")
        plt.axis("off")

    plt.tight_layout()
    plt.show()

# Visualize for both datasets
visualize_augmentation(small_train, "Small Images", 23)
visualize_augmentation(large_train, "Large Images", 101)
No description has been provided for this image
No description has been provided for this image

AUTOTUNE¶

AUTOTUNE optimizes data pipeline performance by dynamically adjusting parallel operations for data loading and augmentation.

It improves the efficiency of our data pipeline by allowing parallel data processing, prefetching, and caching, which can dramatically speed up training.

For tasks involving large datasets and complex transformations (like augmenting images), AUTOTUNE helps avoid bottlenecks and makes better use of our hardware.

In [ ]:
AUTOTUNE = tf.data.AUTOTUNE

small_train = small_train.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
small_val = small_val.cache().prefetch(buffer_size=AUTOTUNE)

# Augmented training data
augmented_small_train = (
    small_train
    .map(lambda x, y: (data_augmentation(x, training=True), y), num_parallel_calls=AUTOTUNE)
    .cache()
    .prefetch(buffer_size=AUTOTUNE)
)

large_train = large_train.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
large_val = large_val.cache().prefetch(buffer_size=AUTOTUNE)

# Augmented training data
augmented_large_train = (
    large_train
    .map(lambda x, y: (data_augmentation(x, training=True), y), num_parallel_calls=AUTOTUNE)
    .cache()
    .prefetch(buffer_size=AUTOTUNE)
)

Model Creation¶

We will first create dictionaries in order to create a dataframe that displays all models.

We will use this dataframe to rank which models are better or worse, and then further hypertune the best models.

Let's discuss what models we are going to build:¶




  1. Custom CNN
  • Consists of three convolutional blocks with increasing filter depths.

  • Uses BatchNormalization after each Conv2D layer to stabilize and accelerate training.

  • Dropout (configurable) is used after each block for regularization and overfitting control.

  • Uses GlobalAveragePooling2D instead of flattening to reduce parameter count and encourage generalization.

  • Ends with a dense classifier (128 units to softmax), suitable for 11-class prediction.




  1. VGG-inspired CNN
  • Mimics the VGG-style deep architecture using repeated 3x3 convolutional layers.

  • Sequential stacking of conv layers per block (32, 64, 128), as per VGG philosophy.

  • Employs MaxPooling2D after blocks to downsample spatial dimensions.

  • Uses GlobalAveragePooling2D for compression instead of flattening—more modern.

  • Final Dense layer: 128 units + Dropout before softmax classifier.

In our "mini-VGG" adaptation we simply reduce the number of pooling layers (or delay them) so we never overcompress the 23x23 or 101x101 input, yet still reap the benefits of VGG's depth and locality bias. We experiment if this method could possibly increase our accuracy.




  1. Mini-Resnet-inspired Model
  • Starts with a base convolution followed by custom residual blocks.

  • Each residual block includes: Two Conv2D layers, BatchNormalization

  • Skip (identity) connections to avoid vanishing gradients and encourage gradient flow.

  • Employs MaxPooling2D and GlobalAveragePooling2D to reduce computation.

  • Final classification layer: Dense(64) + Dropout to softmax.




  1. Mobilenet-Lite-inspired model
  • Tailored for efficiency: uses SeparableConv2D (depthwise separable convolutions) to drastically reduce parameter count and computation.

  • Filter sizes progress from 32, 64, 128 across blocks.

  • Each block includes: SeparableConv2D, BatchNormalization, MaxPooling or GlobalAveragePooling.

  • Starts with Rescaling layer to normalize input pixels to [0, 1].

  • Ends with a compact Dense(64) + Dropout(0.3) to softmax.




  1. Mini-Densenet-inspired model
  • Begins with a Conv2D layer (16 filters) followed by two Dense Blocks.

  • Each Dense Block:

  • Contains 3 convolutional layers with growth rate = 12. Uses BatchNormalization to ReLU to Conv2D. Applies feature concatenation from all previous layers (key DenseNet trait).

  • Includes MaxPooling2D (1x2) after first block and GlobalAveragePooling2D at the end.

  • Final classifier: Dense(64) + Dropout(0.3) to softmax.

A list of potential metrics that we can use:¶

  • Accuracy
    It gives us a quick, overall sense of how our model is right. It is also easy to interpret. The only downside is if there is class imbalance, which we already handled.

  • Precision
    A simple way to think is “When the model says this is a carrot, how often is it actually a carrot?”. High precision means few false alarms, which is important if, say, mistaking a toxic flower for an edible vegetable could be dangerous.

  • Recall
    A simple way to think is "Of all the real apples, how many did I actually detect?". Higher recall means fewer misses, which is critical if you need to catch every instance of a minority or safety-critical class.

  • F1 Score
    This balances precision and recall into a single number. This is useful when we want a trade-off between "false alarms" and "misses" without over-penalizing one.

What metric will we use?¶

Because we want to give a quick and easy, interpretable comparison for our models, we will utilize accuracy. This is because accuracy is the easiest metric to interpret, and the only downside is if there is class imbalance. We already handled class imbalance, hence we do not need to worry about this. In evaluating our best model after hypertuning, we will use all the metrics, so there is no worry.

In [ ]:
# Dictionary to store history from each model
small_history_dict = {}
large_history_dict = {}

Model callbacks¶

I used:

  • Implemented EarlyStopping during model training to monitor validation loss and halt training when no further improvement was observed, thereby preventing overfitting and conserving computational resources. I tuned the parameter of start_from_epoch to 10 because I want to give all models at least 10 epochs of training before comparison.

  • ReduceLROnPlateau to reduce the learning rate when the model's performance plateaus (i.e., when the validation loss stops improving).

It automatically adjusts the learning rate based on the progress, ensuring the model doesn't "get stuck" with a learning rate that is too high when improvements are minimal.

In [ ]:
early_stop = tf.keras.callbacks.EarlyStopping(
    patience=3,
    min_delta=0.0005,
    restore_best_weights=True,
    monitor='val_loss',
    start_from_epoch=10
)

reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=3, min_lr=1e-6, verbose=1)

Dummy Baseline¶

We will create a dummy model in order to compare how well our models are doing. It has no hidden layers, and it acts like a simple linear classifier.

We can confirm whether our models are actually better than random guessing or naive learning.

In [ ]:
def dummy_baseline_model(input_shape=(23, 23, 1), num_classes=11):
    model = tf.keras.Sequential([
        tf.keras.layers.Input(shape=input_shape),
        tf.keras.layers.Flatten(),  # Just flatten the image
        tf.keras.layers.Dense(num_classes, activation='softmax')  # No hidden layers
    ])
    return model
In [ ]:
# ------------------------------Small------------------------------
small_dummy_model = dummy_baseline_model()
small_dummy_model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'])

small_dummy_model.summary()

# ------------------------------Large------------------------------
large_dummy_model = dummy_baseline_model(input_shape=(101, 101, 1))
large_dummy_model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'])

large_dummy_model.summary()
Model: "sequential_26"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ flatten_19 (Flatten)            │ (None, 529)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_31 (Dense)                │ (None, 11)             │         5,830 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 5,830 (22.77 KB)
 Trainable params: 5,830 (22.77 KB)
 Non-trainable params: 0 (0.00 B)
Model: "sequential_27"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ flatten_20 (Flatten)            │ (None, 10201)          │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_32 (Dense)                │ (None, 11)             │       112,222 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 112,222 (438.37 KB)
 Trainable params: 112,222 (438.37 KB)
 Non-trainable params: 0 (0.00 B)
In [ ]:
small_dummy_history = small_dummy_model.fit(
    small_train,
    validation_data=small_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

large_dummy_history = large_dummy_model.fit(
    large_train,
    validation_data=large_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

small_history_dict['Dummy Baseline'] = small_dummy_history.history
large_history_dict['Dummy Baseline'] = large_dummy_history.history
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.1392 - loss: 2.3757 - val_accuracy: 0.2250 - val_loss: 2.2449 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.2396 - loss: 2.1764 - val_accuracy: 0.2168 - val_loss: 2.1897 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.2721 - loss: 2.1165 - val_accuracy: 0.2482 - val_loss: 2.1651 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.2949 - loss: 2.0747 - val_accuracy: 0.2791 - val_loss: 2.1421 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.2986 - loss: 2.0437 - val_accuracy: 0.2941 - val_loss: 2.1319 - learning_rate: 0.0010
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.3127 - loss: 2.0193 - val_accuracy: 0.2895 - val_loss: 2.0913 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.3192 - loss: 2.0059 - val_accuracy: 0.2868 - val_loss: 2.0895 - learning_rate: 0.0010
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.3374 - loss: 1.9705 - val_accuracy: 0.3082 - val_loss: 2.0793 - learning_rate: 0.0010
Epoch 9/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3397 - loss: 1.9654 - val_accuracy: 0.3114 - val_loss: 2.0671 - learning_rate: 0.0010
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.3580 - loss: 1.9420 - val_accuracy: 0.2764 - val_loss: 2.1007 - learning_rate: 0.0010
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.3485 - loss: 1.9456 - val_accuracy: 0.3132 - val_loss: 2.0599 - learning_rate: 0.0010
Epoch 12/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.3571 - loss: 1.9209 - val_accuracy: 0.3118 - val_loss: 2.0679 - learning_rate: 0.0010
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3570 - loss: 1.9192 - val_accuracy: 0.3127 - val_loss: 2.0757 - learning_rate: 0.0010
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.3714 - loss: 1.9011 - val_accuracy: 0.3305 - val_loss: 2.0342 - learning_rate: 0.0010
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.3696 - loss: 1.9050 - val_accuracy: 0.3164 - val_loss: 2.0473 - learning_rate: 0.0010
Epoch 16/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3761 - loss: 1.8915 - val_accuracy: 0.3095 - val_loss: 2.0499 - learning_rate: 0.0010
Epoch 17/20
325/328 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.3866 - loss: 1.8708
Epoch 17: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.3865 - loss: 1.8710 - val_accuracy: 0.2955 - val_loss: 2.0731 - learning_rate: 0.0010
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.1804 - loss: 2.7404 - val_accuracy: 0.2205 - val_loss: 2.6266 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.2689 - loss: 2.3223 - val_accuracy: 0.2859 - val_loss: 2.2324 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.2949 - loss: 2.2145 - val_accuracy: 0.2768 - val_loss: 2.2194 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.3372 - loss: 2.1115 - val_accuracy: 0.2664 - val_loss: 2.4666 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3487 - loss: 2.1101 - val_accuracy: 0.2645 - val_loss: 2.4757 - learning_rate: 0.0010
Epoch 6/20
318/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.3755 - loss: 1.9811
Epoch 6: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3750 - loss: 1.9837 - val_accuracy: 0.2577 - val_loss: 2.5204 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.4250 - loss: 1.7545 - val_accuracy: 0.3018 - val_loss: 2.3362 - learning_rate: 5.0000e-04
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.4429 - loss: 1.7163 - val_accuracy: 0.3223 - val_loss: 2.2842 - learning_rate: 5.0000e-04
Epoch 9/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.4475 - loss: 1.7216 - val_accuracy: 0.3277 - val_loss: 2.1727 - learning_rate: 5.0000e-04
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.4339 - loss: 1.7262 - val_accuracy: 0.3341 - val_loss: 2.0549 - learning_rate: 5.0000e-04
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.4542 - loss: 1.6771 - val_accuracy: 0.2927 - val_loss: 2.2806 - learning_rate: 5.0000e-04
Epoch 12/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.4684 - loss: 1.6245 - val_accuracy: 0.3159 - val_loss: 2.0921 - learning_rate: 5.0000e-04
Epoch 13/20
325/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.4738 - loss: 1.6302
Epoch 13: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.4738 - loss: 1.6304 - val_accuracy: 0.3177 - val_loss: 2.1151 - learning_rate: 5.0000e-04
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.5246 - loss: 1.4988 - val_accuracy: 0.3391 - val_loss: 2.0769 - learning_rate: 2.5000e-04
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.5225 - loss: 1.4936 - val_accuracy: 0.3155 - val_loss: 2.1732 - learning_rate: 2.5000e-04
Epoch 16/20
310/328 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.5293 - loss: 1.4820
Epoch 16: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.5288 - loss: 1.4831 - val_accuracy: 0.3291 - val_loss: 2.1046 - learning_rate: 2.5000e-04
Epoch 17/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.5512 - loss: 1.4184 - val_accuracy: 0.3264 - val_loss: 2.0920 - learning_rate: 1.2500e-04
In [ ]:
# ------------------------------Small------------------------------
aug_small_dummy_model = dummy_baseline_model()
aug_small_dummy_model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'])

aug_small_dummy_model.summary()

# ------------------------------Large------------------------------

aug_large_dummy_model = dummy_baseline_model(input_shape=(101, 101, 1))
aug_large_dummy_model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'])

aug_large_dummy_model.summary()
Model: "sequential_28"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ flatten_21 (Flatten)            │ (None, 529)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_33 (Dense)                │ (None, 11)             │         5,830 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 5,830 (22.77 KB)
 Trainable params: 5,830 (22.77 KB)
 Non-trainable params: 0 (0.00 B)
Model: "sequential_29"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ flatten_22 (Flatten)            │ (None, 10201)          │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_34 (Dense)                │ (None, 11)             │       112,222 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 112,222 (438.37 KB)
 Trainable params: 112,222 (438.37 KB)
 Non-trainable params: 0 (0.00 B)
In [ ]:
# ------------------------------Small------------------------------
aug_small_dummy_history = aug_small_dummy_model.fit(
    augmented_small_train,
    validation_data=small_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

small_history_dict['Dummy Baseline with Augmented Data'] = aug_small_dummy_history.history


# ------------------------------Large-----------------------------
aug_large_dummy_history = aug_large_dummy_model.fit(
    augmented_large_train,
    validation_data=large_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

large_history_dict['Dummy Baseline with Augmented Data'] = aug_large_dummy_history.history
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.1344 - loss: 2.3903 - val_accuracy: 0.1632 - val_loss: 2.3076 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.1764 - loss: 2.2907 - val_accuracy: 0.1755 - val_loss: 2.2851 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.1910 - loss: 2.2652 - val_accuracy: 0.1845 - val_loss: 2.2749 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.1998 - loss: 2.2496 - val_accuracy: 0.1914 - val_loss: 2.2691 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.2076 - loss: 2.2383 - val_accuracy: 0.1968 - val_loss: 2.2654 - learning_rate: 0.0010
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.2134 - loss: 2.2292 - val_accuracy: 0.1991 - val_loss: 2.2628 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.2169 - loss: 2.2214 - val_accuracy: 0.2009 - val_loss: 2.2609 - learning_rate: 0.0010
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.2234 - loss: 2.2145 - val_accuracy: 0.1995 - val_loss: 2.2594 - learning_rate: 0.0010
Epoch 9/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.2261 - loss: 2.2083 - val_accuracy: 0.2027 - val_loss: 2.2582 - learning_rate: 0.0010
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.2320 - loss: 2.2026 - val_accuracy: 0.2064 - val_loss: 2.2572 - learning_rate: 0.0010
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.2345 - loss: 2.1973 - val_accuracy: 0.2073 - val_loss: 2.2564 - learning_rate: 0.0010
Epoch 12/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.2371 - loss: 2.1924 - val_accuracy: 0.2095 - val_loss: 2.2556 - learning_rate: 0.0010
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.2393 - loss: 2.1877 - val_accuracy: 0.2100 - val_loss: 2.2550 - learning_rate: 0.0010
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.2406 - loss: 2.1833 - val_accuracy: 0.2086 - val_loss: 2.2544 - learning_rate: 0.0010
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.2449 - loss: 2.1791 - val_accuracy: 0.2091 - val_loss: 2.2540 - learning_rate: 0.0010
Epoch 16/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.2476 - loss: 2.1751 - val_accuracy: 0.2114 - val_loss: 2.2536 - learning_rate: 0.0010
Epoch 17/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.2492 - loss: 2.1713 - val_accuracy: 0.2114 - val_loss: 2.2532 - learning_rate: 0.0010
Epoch 18/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.2510 - loss: 2.1677 - val_accuracy: 0.2118 - val_loss: 2.2529 - learning_rate: 0.0010
Epoch 19/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.2531 - loss: 2.1642 - val_accuracy: 0.2114 - val_loss: 2.2527 - learning_rate: 0.0010
Epoch 20/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.2553 - loss: 2.1609 - val_accuracy: 0.2123 - val_loss: 2.2525 - learning_rate: 0.0010
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.1300 - loss: 3.1514 - val_accuracy: 0.1295 - val_loss: 2.9608 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.1753 - loss: 2.6890 - val_accuracy: 0.1409 - val_loss: 3.0333 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.1910 - loss: 2.6381 - val_accuracy: 0.1455 - val_loss: 3.0445 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.2086 - loss: 2.5955
Epoch 4: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.2086 - loss: 2.5954 - val_accuracy: 0.1514 - val_loss: 3.0352 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.2292 - loss: 2.3724 - val_accuracy: 0.1573 - val_loss: 2.4807 - learning_rate: 5.0000e-04
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.2367 - loss: 2.2796 - val_accuracy: 0.1582 - val_loss: 2.4932 - learning_rate: 5.0000e-04
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.2421 - loss: 2.2647 - val_accuracy: 0.1595 - val_loss: 2.5024 - learning_rate: 5.0000e-04
Epoch 8/20
319/328 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.2487 - loss: 2.2499
Epoch 8: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.2487 - loss: 2.2495 - val_accuracy: 0.1636 - val_loss: 2.5108 - learning_rate: 5.0000e-04
Epoch 9/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.2978 - loss: 2.0893 - val_accuracy: 0.1741 - val_loss: 2.3709 - learning_rate: 2.5000e-04
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3068 - loss: 2.0652 - val_accuracy: 0.1750 - val_loss: 2.3708 - learning_rate: 2.5000e-04
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3068 - loss: 2.0597 - val_accuracy: 0.1727 - val_loss: 2.3711 - learning_rate: 2.5000e-04
Epoch 12/20
327/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.3099 - loss: 2.0537
Epoch 12: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3099 - loss: 2.0536 - val_accuracy: 0.1759 - val_loss: 2.3720 - learning_rate: 2.5000e-04
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3249 - loss: 1.9859 - val_accuracy: 0.1741 - val_loss: 2.3616 - learning_rate: 1.2500e-04
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.3331 - loss: 1.9811 - val_accuracy: 0.1732 - val_loss: 2.3636 - learning_rate: 1.2500e-04
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.3347 - loss: 1.9765 - val_accuracy: 0.1736 - val_loss: 2.3656 - learning_rate: 1.2500e-04
Epoch 16/20
320/328 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.3369 - loss: 1.9721
Epoch 16: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.3368 - loss: 1.9720 - val_accuracy: 0.1714 - val_loss: 2.3675 - learning_rate: 1.2500e-04

Our dummy training accuracy is around 0.35, and validation accuracy 0.29, hence we will use this to compare to our other models.

We did a quick comparison on how the dummy model performed on augmented data, and it performed worse, with a training accuracy of 0.23, and a validation accuracy of 0.20.

1. Custom CNN¶

  1. Custom CNN
  • Consists of three convolutional blocks with increasing filter depths.

  • Uses BatchNormalization after each Conv2D layer to stabilize and accelerate training.

  • Dropout (configurable) is used after each block for regularization and overfitting control.

  • Uses GlobalAveragePooling2D instead of flattening to reduce parameter count and encourage generalization.

  • Ends with a dense classifier (128 units to softmax), suitable for 11-class prediction.

In [ ]:
def custom_cnn(input_shape=(23, 23, 1), num_classes=11,
               dropout_rate=0.5,
               filters_block1=32,
               filters_block2=64,
               dense_units=128):
    model = tf.keras.Sequential([
        tf.keras.layers.Input(shape=input_shape),

        # Block 1
        tf.keras.layers.Conv2D(filters_block1, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Conv2D(filters_block1, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dropout(dropout_rate),

        # Block 2 + Pooling
        tf.keras.layers.Conv2D(filters_block2, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Conv2D(filters_block2, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Dropout(dropout_rate),

        # Block 3
        tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dropout(dropout_rate),

        # Classifier
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dense(dense_units, activation='relu'),
        tf.keras.layers.Dropout(dropout_rate),
        tf.keras.layers.Dense(num_classes, activation='softmax')
    ])
    return model
In [ ]:
# ------------------------------Small------------------------------
small_cnn = custom_cnn()
small_cnn.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

small_cnn.summary()


# ------------------------------Large------------------------------
large_cnn = custom_cnn(input_shape=(101, 101, 1))
large_cnn.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

large_cnn.summary()
Model: "sequential_42"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ conv2d_160 (Conv2D)             │ (None, 23, 23, 32)     │           320 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_124         │ (None, 23, 23, 32)     │           128 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_161 (Conv2D)             │ (None, 23, 23, 32)     │         9,248 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_125         │ (None, 23, 23, 32)     │           128 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_54 (Dropout)            │ (None, 23, 23, 32)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_162 (Conv2D)             │ (None, 23, 23, 64)     │        18,496 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_126         │ (None, 23, 23, 64)     │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_163 (Conv2D)             │ (None, 23, 23, 64)     │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_127         │ (None, 23, 23, 64)     │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_40 (MaxPooling2D) │ (None, 11, 11, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_55 (Dropout)            │ (None, 11, 11, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_164 (Conv2D)             │ (None, 11, 11, 128)    │        73,856 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_128         │ (None, 11, 11, 128)    │           512 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_165 (Conv2D)             │ (None, 11, 11, 128)    │       147,584 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_129         │ (None, 11, 11, 128)    │           512 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_56 (Dropout)            │ (None, 11, 11, 128)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling2d_30     │ (None, 128)            │             0 │
│ (GlobalAveragePooling2D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_83 (Dense)                │ (None, 128)            │        16,512 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_57 (Dropout)            │ (None, 128)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_84 (Dense)                │ (None, 11)             │         1,419 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 306,155 (1.17 MB)
 Trainable params: 305,259 (1.16 MB)
 Non-trainable params: 896 (3.50 KB)
Model: "sequential_43"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ conv2d_166 (Conv2D)             │ (None, 101, 101, 32)   │           320 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_130         │ (None, 101, 101, 32)   │           128 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_167 (Conv2D)             │ (None, 101, 101, 32)   │         9,248 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_131         │ (None, 101, 101, 32)   │           128 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_58 (Dropout)            │ (None, 101, 101, 32)   │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_168 (Conv2D)             │ (None, 101, 101, 64)   │        18,496 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_132         │ (None, 101, 101, 64)   │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_169 (Conv2D)             │ (None, 101, 101, 64)   │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_133         │ (None, 101, 101, 64)   │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_41 (MaxPooling2D) │ (None, 50, 50, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_59 (Dropout)            │ (None, 50, 50, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_170 (Conv2D)             │ (None, 50, 50, 128)    │        73,856 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_134         │ (None, 50, 50, 128)    │           512 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_171 (Conv2D)             │ (None, 50, 50, 128)    │       147,584 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_135         │ (None, 50, 50, 128)    │           512 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_60 (Dropout)            │ (None, 50, 50, 128)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling2d_31     │ (None, 128)            │             0 │
│ (GlobalAveragePooling2D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_85 (Dense)                │ (None, 128)            │        16,512 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_61 (Dropout)            │ (None, 128)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_86 (Dense)                │ (None, 11)             │         1,419 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 306,155 (1.17 MB)
 Trainable params: 305,259 (1.16 MB)
 Non-trainable params: 896 (3.50 KB)
In [ ]:
# ------------------------------Small------------------------------
small_cnn_history = small_cnn.fit(
    small_train,
    validation_data=small_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

small_history_dict['Custom CNN'] = small_cnn_history.history

# ------------------------------Large------------------------------
large_cnn_history = large_cnn.fit(
    large_train,
    validation_data=large_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

large_history_dict['Custom CNN'] = large_cnn_history.history
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 26ms/step - accuracy: 0.2545 - loss: 2.1011 - val_accuracy: 0.1382 - val_loss: 6.2379 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - accuracy: 0.5043 - loss: 1.4389 - val_accuracy: 0.5141 - val_loss: 1.4404 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.6197 - loss: 1.1241 - val_accuracy: 0.6441 - val_loss: 1.0432 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.6778 - loss: 0.9443 - val_accuracy: 0.6355 - val_loss: 1.2838 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - accuracy: 0.7387 - loss: 0.7848 - val_accuracy: 0.6841 - val_loss: 0.9091 - learning_rate: 0.0010
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7774 - loss: 0.6708 - val_accuracy: 0.8109 - val_loss: 0.5668 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.8002 - loss: 0.6115 - val_accuracy: 0.7800 - val_loss: 0.6434 - learning_rate: 0.0010
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.8108 - loss: 0.5541 - val_accuracy: 0.8495 - val_loss: 0.4859 - learning_rate: 0.0010
Epoch 9/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.8505 - loss: 0.4592 - val_accuracy: 0.8527 - val_loss: 0.4294 - learning_rate: 0.0010
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.8557 - loss: 0.4390 - val_accuracy: 0.8595 - val_loss: 0.4123 - learning_rate: 0.0010
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.8640 - loss: 0.4179 - val_accuracy: 0.7750 - val_loss: 0.7229 - learning_rate: 0.0010
Epoch 12/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.8845 - loss: 0.3590 - val_accuracy: 0.8882 - val_loss: 0.3677 - learning_rate: 0.0010
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.8819 - loss: 0.3482 - val_accuracy: 0.8755 - val_loss: 0.4039 - learning_rate: 0.0010
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 7ms/step - accuracy: 0.8854 - loss: 0.3517 - val_accuracy: 0.8850 - val_loss: 0.3799 - learning_rate: 0.0010
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 9ms/step - accuracy: 0.9046 - loss: 0.2838 - val_accuracy: 0.8868 - val_loss: 0.3223 - learning_rate: 0.0010
Epoch 16/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 8ms/step - accuracy: 0.9029 - loss: 0.2942 - val_accuracy: 0.8073 - val_loss: 0.7714 - learning_rate: 0.0010
Epoch 17/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.9146 - loss: 0.2556 - val_accuracy: 0.9105 - val_loss: 0.2972 - learning_rate: 0.0010
Epoch 18/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.9149 - loss: 0.2439 - val_accuracy: 0.9086 - val_loss: 0.2623 - learning_rate: 0.0010
Epoch 19/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.9217 - loss: 0.2417 - val_accuracy: 0.8768 - val_loss: 0.4071 - learning_rate: 0.0010
Epoch 20/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.9256 - loss: 0.2263 - val_accuracy: 0.9077 - val_loss: 0.2976 - learning_rate: 0.0010
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 43s 101ms/step - accuracy: 0.3177 - loss: 1.9476 - val_accuracy: 0.0909 - val_loss: 8.0101 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 30s 82ms/step - accuracy: 0.5992 - loss: 1.1908 - val_accuracy: 0.4718 - val_loss: 1.8686 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.7616 - loss: 0.7423 - val_accuracy: 0.3195 - val_loss: 10.0317 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 27s 83ms/step - accuracy: 0.8253 - loss: 0.5542 - val_accuracy: 0.6764 - val_loss: 1.1672 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.8704 - loss: 0.4077 - val_accuracy: 0.7023 - val_loss: 1.2906 - learning_rate: 0.0010
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 41s 81ms/step - accuracy: 0.8969 - loss: 0.3320 - val_accuracy: 0.8000 - val_loss: 0.6983 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 41s 81ms/step - accuracy: 0.9131 - loss: 0.2731 - val_accuracy: 0.6836 - val_loss: 1.8680 - learning_rate: 0.0010
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.9193 - loss: 0.2498 - val_accuracy: 0.9182 - val_loss: 0.2457 - learning_rate: 0.0010
Epoch 9/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 41s 80ms/step - accuracy: 0.9349 - loss: 0.2012 - val_accuracy: 0.9014 - val_loss: 0.2852 - learning_rate: 0.0010
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 42s 81ms/step - accuracy: 0.9532 - loss: 0.1499 - val_accuracy: 0.6668 - val_loss: 2.6934 - learning_rate: 0.0010
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 75ms/step - accuracy: 0.9452 - loss: 0.1748
Epoch 11: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.9452 - loss: 0.1747 - val_accuracy: 0.8609 - val_loss: 0.4725 - learning_rate: 0.0010
Epoch 12/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 79ms/step - accuracy: 0.9652 - loss: 0.1106 - val_accuracy: 0.9464 - val_loss: 0.1648 - learning_rate: 5.0000e-04
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.9754 - loss: 0.0858 - val_accuracy: 0.9309 - val_loss: 0.6129 - learning_rate: 5.0000e-04
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.9788 - loss: 0.0787 - val_accuracy: 0.8882 - val_loss: 0.4159 - learning_rate: 5.0000e-04
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 27s 83ms/step - accuracy: 0.9776 - loss: 0.0797 - val_accuracy: 0.9627 - val_loss: 0.1272 - learning_rate: 5.0000e-04
Epoch 16/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 79ms/step - accuracy: 0.9780 - loss: 0.0729 - val_accuracy: 0.9605 - val_loss: 0.1273 - learning_rate: 5.0000e-04
Epoch 17/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 41s 80ms/step - accuracy: 0.9810 - loss: 0.0674 - val_accuracy: 0.9732 - val_loss: 0.0978 - learning_rate: 5.0000e-04
Epoch 18/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 42s 84ms/step - accuracy: 0.9821 - loss: 0.0598 - val_accuracy: 0.8868 - val_loss: 0.8896 - learning_rate: 5.0000e-04
Epoch 19/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.9788 - loss: 0.0700 - val_accuracy: 0.9327 - val_loss: 0.2111 - learning_rate: 5.0000e-04
Epoch 20/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 75ms/step - accuracy: 0.9810 - loss: 0.0574
Epoch 20: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
328/328 ━━━━━━━━━━━━━━━━━━━━ 41s 79ms/step - accuracy: 0.9810 - loss: 0.0574 - val_accuracy: 0.9673 - val_loss: 0.1011 - learning_rate: 5.0000e-04
In [ ]:
# ------------------------------Small------------------------------
aug_small_cnn = custom_cnn()
aug_small_cnn.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

aug_small_cnn.summary()

# ------------------------------Large------------------------------
aug_large_cnn = custom_cnn(input_shape=(101, 101, 1))
aug_large_cnn.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

aug_large_cnn.summary()
Model: "sequential_32"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ conv2d_48 (Conv2D)              │ (None, 23, 23, 32)     │           320 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_36          │ (None, 23, 23, 32)     │           128 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_49 (Conv2D)              │ (None, 23, 23, 32)     │         9,248 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_37          │ (None, 23, 23, 32)     │           128 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_26 (Dropout)            │ (None, 23, 23, 32)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_50 (Conv2D)              │ (None, 23, 23, 64)     │        18,496 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_38          │ (None, 23, 23, 64)     │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_51 (Conv2D)              │ (None, 23, 23, 64)     │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_39          │ (None, 23, 23, 64)     │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_10 (MaxPooling2D) │ (None, 11, 11, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_27 (Dropout)            │ (None, 11, 11, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_52 (Conv2D)              │ (None, 11, 11, 128)    │        73,856 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_40          │ (None, 11, 11, 128)    │           512 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_53 (Conv2D)              │ (None, 11, 11, 128)    │       147,584 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_41          │ (None, 11, 11, 128)    │           512 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_28 (Dropout)            │ (None, 11, 11, 128)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling2d_8      │ (None, 128)            │             0 │
│ (GlobalAveragePooling2D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_39 (Dense)                │ (None, 128)            │        16,512 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_29 (Dropout)            │ (None, 128)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_40 (Dense)                │ (None, 11)             │         1,419 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 306,155 (1.17 MB)
 Trainable params: 305,259 (1.16 MB)
 Non-trainable params: 896 (3.50 KB)
Model: "sequential_33"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ conv2d_54 (Conv2D)              │ (None, 101, 101, 32)   │           320 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_42          │ (None, 101, 101, 32)   │           128 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_55 (Conv2D)              │ (None, 101, 101, 32)   │         9,248 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_43          │ (None, 101, 101, 32)   │           128 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_30 (Dropout)            │ (None, 101, 101, 32)   │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_56 (Conv2D)              │ (None, 101, 101, 64)   │        18,496 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_44          │ (None, 101, 101, 64)   │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_57 (Conv2D)              │ (None, 101, 101, 64)   │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_45          │ (None, 101, 101, 64)   │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_11 (MaxPooling2D) │ (None, 50, 50, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_31 (Dropout)            │ (None, 50, 50, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_58 (Conv2D)              │ (None, 50, 50, 128)    │        73,856 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_46          │ (None, 50, 50, 128)    │           512 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_59 (Conv2D)              │ (None, 50, 50, 128)    │       147,584 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_47          │ (None, 50, 50, 128)    │           512 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_32 (Dropout)            │ (None, 50, 50, 128)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling2d_9      │ (None, 128)            │             0 │
│ (GlobalAveragePooling2D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_41 (Dense)                │ (None, 128)            │        16,512 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_33 (Dropout)            │ (None, 128)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_42 (Dense)                │ (None, 11)             │         1,419 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 306,155 (1.17 MB)
 Trainable params: 305,259 (1.16 MB)
 Non-trainable params: 896 (3.50 KB)
In [ ]:
# ------------------------------Small------------------------------
aug_small_cnn_history = aug_small_cnn.fit(
    augmented_small_train,
    validation_data=small_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

small_history_dict['Custom CNN with Augmented Data'] = aug_small_cnn_history.history

# ------------------------------Large------------------------------
aug_large_cnn_history = aug_large_cnn.fit(
    augmented_large_train,
    validation_data=large_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

large_history_dict['Custom CNN with Augmented Data'] = aug_large_cnn_history.history
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 26ms/step - accuracy: 0.2246 - loss: 2.1962 - val_accuracy: 0.0836 - val_loss: 6.3724 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.3892 - loss: 1.7874 - val_accuracy: 0.4377 - val_loss: 1.7344 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.4878 - loss: 1.4998 - val_accuracy: 0.3182 - val_loss: 2.0995 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.5421 - loss: 1.3568 - val_accuracy: 0.3200 - val_loss: 2.6029 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.5839 - loss: 1.2419 - val_accuracy: 0.4868 - val_loss: 1.6488 - learning_rate: 0.0010
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.6251 - loss: 1.1129 - val_accuracy: 0.5114 - val_loss: 1.5459 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.6454 - loss: 1.0462 - val_accuracy: 0.4277 - val_loss: 2.1435 - learning_rate: 0.0010
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.6740 - loss: 0.9788 - val_accuracy: 0.4050 - val_loss: 2.3443 - learning_rate: 0.0010
Epoch 9/20
321/328 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.6891 - loss: 0.9172
Epoch 9: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 8ms/step - accuracy: 0.6894 - loss: 0.9165 - val_accuracy: 0.5018 - val_loss: 1.7964 - learning_rate: 0.0010
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7273 - loss: 0.8165 - val_accuracy: 0.5491 - val_loss: 1.6418 - learning_rate: 5.0000e-04
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7431 - loss: 0.7490 - val_accuracy: 0.5014 - val_loss: 1.9048 - learning_rate: 5.0000e-04
Epoch 12/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.7603 - loss: 0.7094 - val_accuracy: 0.5886 - val_loss: 1.4703 - learning_rate: 5.0000e-04
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7647 - loss: 0.7052 - val_accuracy: 0.5423 - val_loss: 1.8088 - learning_rate: 5.0000e-04
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7753 - loss: 0.6705 - val_accuracy: 0.5409 - val_loss: 1.9494 - learning_rate: 5.0000e-04
Epoch 15/20
326/328 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.7827 - loss: 0.6329
Epoch 15: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7828 - loss: 0.6328 - val_accuracy: 0.5195 - val_loss: 2.0118 - learning_rate: 5.0000e-04
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 44s 101ms/step - accuracy: 0.2955 - loss: 2.0076 - val_accuracy: 0.0950 - val_loss: 4.9160 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 29s 82ms/step - accuracy: 0.5431 - loss: 1.3448 - val_accuracy: 0.4055 - val_loss: 1.7428 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 40s 80ms/step - accuracy: 0.6368 - loss: 1.0598 - val_accuracy: 0.5264 - val_loss: 1.7484 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 27s 82ms/step - accuracy: 0.7187 - loss: 0.8396 - val_accuracy: 0.2568 - val_loss: 10.8765 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 75ms/step - accuracy: 0.7728 - loss: 0.7064
Epoch 5: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 40s 80ms/step - accuracy: 0.7729 - loss: 0.7063 - val_accuracy: 0.5050 - val_loss: 4.2580 - learning_rate: 0.0010
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 28s 85ms/step - accuracy: 0.8302 - loss: 0.5357 - val_accuracy: 0.6341 - val_loss: 1.4416 - learning_rate: 5.0000e-04
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.8590 - loss: 0.4427 - val_accuracy: 0.5877 - val_loss: 1.6160 - learning_rate: 5.0000e-04
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 41s 80ms/step - accuracy: 0.8706 - loss: 0.3970 - val_accuracy: 0.5382 - val_loss: 2.2017 - learning_rate: 5.0000e-04
Epoch 9/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 78ms/step - accuracy: 0.8829 - loss: 0.3618
Epoch 9: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
328/328 ━━━━━━━━━━━━━━━━━━━━ 28s 86ms/step - accuracy: 0.8829 - loss: 0.3618 - val_accuracy: 0.6323 - val_loss: 1.6146 - learning_rate: 5.0000e-04
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.9127 - loss: 0.2915 - val_accuracy: 0.7473 - val_loss: 0.8932 - learning_rate: 2.5000e-04
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 41s 80ms/step - accuracy: 0.9170 - loss: 0.2635 - val_accuracy: 0.7395 - val_loss: 0.9783 - learning_rate: 2.5000e-04
Epoch 12/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 42s 85ms/step - accuracy: 0.9211 - loss: 0.2479 - val_accuracy: 0.7559 - val_loss: 0.8543 - learning_rate: 2.5000e-04
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.9231 - loss: 0.2409 - val_accuracy: 0.7414 - val_loss: 0.9867 - learning_rate: 2.5000e-04
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.9297 - loss: 0.2240 - val_accuracy: 0.7495 - val_loss: 0.9265 - learning_rate: 2.5000e-04
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 76ms/step - accuracy: 0.9326 - loss: 0.2111
Epoch 15: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.9327 - loss: 0.2111 - val_accuracy: 0.7341 - val_loss: 1.0314 - learning_rate: 2.5000e-04

We noticed that for our custom CNN, the model performed worse on the augmented dataset.

2. VGG-inspired CNN¶

  1. VGG-inspired CNN
  • Mimics the VGG-style deep architecture using repeated 3x3 convolutional layers.

  • Sequential stacking of conv layers per block (32, 64, 128), as per VGG philosophy.

  • Employs MaxPooling2D after blocks to downsample spatial dimensions.

  • Uses GlobalAveragePooling2D for compression instead of flattening—more modern.

  • Final Dense layer: 128 units + Dropout before softmax classifier.

In [ ]:
def vgg_model(input_shape=(23, 23, 1), num_classes=11):
    model = tf.keras.Sequential([
        tf.keras.layers.Input(shape=input_shape),

        tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.MaxPooling2D(pool_size=(1, 2)),

        tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),

        tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        tf.keras.layers.GlobalAveragePooling2D(),

        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(num_classes, activation='softmax')
    ])
    return model
In [ ]:
# ------------------------------Small------------------------------
small_vgg = vgg_model()
small_vgg.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
small_vgg.summary()

# ------------------------------Large------------------------------
large_vgg = vgg_model(input_shape=(101, 101, 1))
large_vgg.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
large_vgg.summary()
Model: "sequential_44"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ conv2d_172 (Conv2D)             │ (None, 23, 23, 32)     │           320 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_173 (Conv2D)             │ (None, 23, 23, 32)     │         9,248 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_42 (MaxPooling2D) │ (None, 23, 11, 32)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_174 (Conv2D)             │ (None, 23, 11, 64)     │        18,496 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_175 (Conv2D)             │ (None, 23, 11, 64)     │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_43 (MaxPooling2D) │ (None, 11, 5, 64)      │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_176 (Conv2D)             │ (None, 11, 5, 128)     │        73,856 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_177 (Conv2D)             │ (None, 11, 5, 128)     │       147,584 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling2d_32     │ (None, 128)            │             0 │
│ (GlobalAveragePooling2D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_87 (Dense)                │ (None, 128)            │        16,512 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_62 (Dropout)            │ (None, 128)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_88 (Dense)                │ (None, 11)             │         1,419 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 304,363 (1.16 MB)
 Trainable params: 304,363 (1.16 MB)
 Non-trainable params: 0 (0.00 B)
Model: "sequential_45"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ conv2d_178 (Conv2D)             │ (None, 101, 101, 32)   │           320 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_179 (Conv2D)             │ (None, 101, 101, 32)   │         9,248 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_44 (MaxPooling2D) │ (None, 101, 50, 32)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_180 (Conv2D)             │ (None, 101, 50, 64)    │        18,496 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_181 (Conv2D)             │ (None, 101, 50, 64)    │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_45 (MaxPooling2D) │ (None, 50, 25, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_182 (Conv2D)             │ (None, 50, 25, 128)    │        73,856 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_183 (Conv2D)             │ (None, 50, 25, 128)    │       147,584 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling2d_33     │ (None, 128)            │             0 │
│ (GlobalAveragePooling2D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_89 (Dense)                │ (None, 128)            │        16,512 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_63 (Dropout)            │ (None, 128)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_90 (Dense)                │ (None, 11)             │         1,419 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 304,363 (1.16 MB)
 Trainable params: 304,363 (1.16 MB)
 Non-trainable params: 0 (0.00 B)
In [ ]:
# ------------------------------Small------------------------------
small_vgg_history = small_vgg.fit(
    small_train,
    validation_data=small_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

small_history_dict['VGG'] = small_vgg_history.history


# ------------------------------Large------------------------------
large_vgg_history = large_vgg.fit(
    large_train,
    validation_data=large_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

large_history_dict['VGG'] = large_vgg_history.history
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 8s 12ms/step - accuracy: 0.0937 - loss: 2.3984 - val_accuracy: 0.1082 - val_loss: 2.3728 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.1666 - loss: 2.2724 - val_accuracy: 0.2264 - val_loss: 2.1362 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.2917 - loss: 1.9686 - val_accuracy: 0.3050 - val_loss: 1.9496 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.3639 - loss: 1.8303 - val_accuracy: 0.4400 - val_loss: 1.6719 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.4139 - loss: 1.6880 - val_accuracy: 0.4409 - val_loss: 1.6270 - learning_rate: 0.0010
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.4706 - loss: 1.5812 - val_accuracy: 0.4509 - val_loss: 1.5683 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.5217 - loss: 1.4351 - val_accuracy: 0.5718 - val_loss: 1.2660 - learning_rate: 0.0010
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.5681 - loss: 1.2985 - val_accuracy: 0.5982 - val_loss: 1.2093 - learning_rate: 0.0010
Epoch 9/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6320 - loss: 1.1293 - val_accuracy: 0.6655 - val_loss: 0.9910 - learning_rate: 0.0010
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.6901 - loss: 0.9493 - val_accuracy: 0.6973 - val_loss: 0.8918 - learning_rate: 0.0010
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.7267 - loss: 0.8203 - val_accuracy: 0.7309 - val_loss: 0.7952 - learning_rate: 0.0010
Epoch 12/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7566 - loss: 0.7246 - val_accuracy: 0.7900 - val_loss: 0.6333 - learning_rate: 0.0010
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7943 - loss: 0.6244 - val_accuracy: 0.7927 - val_loss: 0.6316 - learning_rate: 0.0010
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8336 - loss: 0.5111 - val_accuracy: 0.7777 - val_loss: 0.6660 - learning_rate: 0.0010
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8473 - loss: 0.4631 - val_accuracy: 0.8182 - val_loss: 0.5632 - learning_rate: 0.0010
Epoch 16/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8662 - loss: 0.4065 - val_accuracy: 0.8023 - val_loss: 0.6385 - learning_rate: 0.0010
Epoch 17/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8835 - loss: 0.3613 - val_accuracy: 0.8368 - val_loss: 0.5407 - learning_rate: 0.0010
Epoch 18/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8926 - loss: 0.3304 - val_accuracy: 0.8550 - val_loss: 0.4841 - learning_rate: 0.0010
Epoch 19/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9146 - loss: 0.2622 - val_accuracy: 0.8518 - val_loss: 0.4823 - learning_rate: 0.0010
Epoch 20/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.9162 - loss: 0.2543 - val_accuracy: 0.8441 - val_loss: 0.4845 - learning_rate: 0.0010
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 51ms/step - accuracy: 0.1026 - loss: 2.3887 - val_accuracy: 0.1782 - val_loss: 2.2549 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 43ms/step - accuracy: 0.1848 - loss: 2.2529 - val_accuracy: 0.2182 - val_loss: 2.2171 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 43ms/step - accuracy: 0.2265 - loss: 2.2034 - val_accuracy: 0.2200 - val_loss: 2.1850 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 42ms/step - accuracy: 0.2882 - loss: 2.0776 - val_accuracy: 0.3200 - val_loss: 1.9823 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 43ms/step - accuracy: 0.3492 - loss: 1.9127 - val_accuracy: 0.3418 - val_loss: 1.9219 - learning_rate: 0.0010
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 21s 43ms/step - accuracy: 0.3804 - loss: 1.7930 - val_accuracy: 0.4241 - val_loss: 1.7235 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 42ms/step - accuracy: 0.4301 - loss: 1.6564 - val_accuracy: 0.4918 - val_loss: 1.5147 - learning_rate: 0.0010
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 42ms/step - accuracy: 0.4712 - loss: 1.5569 - val_accuracy: 0.5673 - val_loss: 1.3011 - learning_rate: 0.0010
Epoch 9/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 42ms/step - accuracy: 0.5408 - loss: 1.3848 - val_accuracy: 0.6782 - val_loss: 0.9924 - learning_rate: 0.0010
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 42ms/step - accuracy: 0.5887 - loss: 1.2029 - val_accuracy: 0.7118 - val_loss: 0.8880 - learning_rate: 0.0010
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 42ms/step - accuracy: 0.6683 - loss: 0.9954 - val_accuracy: 0.7409 - val_loss: 0.7648 - learning_rate: 0.0010
Epoch 12/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 42ms/step - accuracy: 0.6972 - loss: 0.8884 - val_accuracy: 0.8068 - val_loss: 0.6020 - learning_rate: 0.0010
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 44ms/step - accuracy: 0.7432 - loss: 0.7655 - val_accuracy: 0.8041 - val_loss: 0.5961 - learning_rate: 0.0010
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 43ms/step - accuracy: 0.7681 - loss: 0.6929 - val_accuracy: 0.8205 - val_loss: 0.5339 - learning_rate: 0.0010
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 42ms/step - accuracy: 0.8022 - loss: 0.6126 - val_accuracy: 0.8700 - val_loss: 0.4004 - learning_rate: 0.0010
Epoch 16/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 42ms/step - accuracy: 0.8183 - loss: 0.5558 - val_accuracy: 0.8845 - val_loss: 0.3864 - learning_rate: 0.0010
Epoch 17/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 21s 42ms/step - accuracy: 0.8341 - loss: 0.5117 - val_accuracy: 0.9068 - val_loss: 0.3174 - learning_rate: 0.0010
Epoch 18/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 44ms/step - accuracy: 0.8509 - loss: 0.4545 - val_accuracy: 0.8941 - val_loss: 0.3435 - learning_rate: 0.0010
Epoch 19/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 44ms/step - accuracy: 0.8577 - loss: 0.4328 - val_accuracy: 0.9145 - val_loss: 0.2938 - learning_rate: 0.0010
Epoch 20/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 42ms/step - accuracy: 0.8656 - loss: 0.4118 - val_accuracy: 0.9227 - val_loss: 0.2655 - learning_rate: 0.0010
In [ ]:
# ------------------------------Small------------------------------
aug_small_vgg = vgg_model()
aug_small_vgg.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

aug_small_vgg.summary()


# ------------------------------Large------------------------------
aug_large_vgg = vgg_model(input_shape=(101, 101, 1))
aug_large_vgg.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

aug_large_vgg.summary()
Model: "sequential_46"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ conv2d_184 (Conv2D)             │ (None, 23, 23, 32)     │           320 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_185 (Conv2D)             │ (None, 23, 23, 32)     │         9,248 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_46 (MaxPooling2D) │ (None, 23, 11, 32)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_186 (Conv2D)             │ (None, 23, 11, 64)     │        18,496 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_187 (Conv2D)             │ (None, 23, 11, 64)     │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_47 (MaxPooling2D) │ (None, 11, 5, 64)      │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_188 (Conv2D)             │ (None, 11, 5, 128)     │        73,856 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_189 (Conv2D)             │ (None, 11, 5, 128)     │       147,584 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling2d_34     │ (None, 128)            │             0 │
│ (GlobalAveragePooling2D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_91 (Dense)                │ (None, 128)            │        16,512 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_64 (Dropout)            │ (None, 128)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_92 (Dense)                │ (None, 11)             │         1,419 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 304,363 (1.16 MB)
 Trainable params: 304,363 (1.16 MB)
 Non-trainable params: 0 (0.00 B)
Model: "sequential_47"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ conv2d_190 (Conv2D)             │ (None, 101, 101, 32)   │           320 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_191 (Conv2D)             │ (None, 101, 101, 32)   │         9,248 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_48 (MaxPooling2D) │ (None, 101, 50, 32)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_192 (Conv2D)             │ (None, 101, 50, 64)    │        18,496 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_193 (Conv2D)             │ (None, 101, 50, 64)    │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_49 (MaxPooling2D) │ (None, 50, 25, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_194 (Conv2D)             │ (None, 50, 25, 128)    │        73,856 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_195 (Conv2D)             │ (None, 50, 25, 128)    │       147,584 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling2d_35     │ (None, 128)            │             0 │
│ (GlobalAveragePooling2D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_93 (Dense)                │ (None, 128)            │        16,512 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_65 (Dropout)            │ (None, 128)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_94 (Dense)                │ (None, 11)             │         1,419 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 304,363 (1.16 MB)
 Trainable params: 304,363 (1.16 MB)
 Non-trainable params: 0 (0.00 B)
In [ ]:
# ------------------------------Small------------------------------
aug_small_vgg_history = aug_small_vgg.fit(
    augmented_small_train,
    validation_data=small_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

small_history_dict['VGG with Augmented Data'] = aug_small_vgg_history.history


# ------------------------------Large------------------------------
aug_large_vgg_history = aug_large_vgg.fit(
    augmented_large_train,
    validation_data=large_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

large_history_dict['VGG with Augmented Data'] = aug_large_vgg_history.history
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 7s 12ms/step - accuracy: 0.0902 - loss: 2.3988 - val_accuracy: 0.0818 - val_loss: 2.3971 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.1367 - loss: 2.3392 - val_accuracy: 0.1791 - val_loss: 2.2224 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.1780 - loss: 2.2512 - val_accuracy: 0.2095 - val_loss: 2.1647 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.2134 - loss: 2.2164 - val_accuracy: 0.2986 - val_loss: 2.0657 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.2606 - loss: 2.1261 - val_accuracy: 0.3214 - val_loss: 1.9577 - learning_rate: 0.0010
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.2844 - loss: 2.0557 - val_accuracy: 0.3109 - val_loss: 2.0132 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.3036 - loss: 1.9854 - val_accuracy: 0.2777 - val_loss: 2.1146 - learning_rate: 0.0010
Epoch 8/20
324/328 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.3291 - loss: 1.9255
Epoch 8: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.3293 - loss: 1.9250 - val_accuracy: 0.3177 - val_loss: 2.0561 - learning_rate: 0.0010
Epoch 9/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.3726 - loss: 1.8224 - val_accuracy: 0.3655 - val_loss: 1.9080 - learning_rate: 5.0000e-04
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.3998 - loss: 1.7446 - val_accuracy: 0.3727 - val_loss: 1.9407 - learning_rate: 5.0000e-04
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.4274 - loss: 1.6660 - val_accuracy: 0.3677 - val_loss: 2.0072 - learning_rate: 5.0000e-04
Epoch 12/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.4577 - loss: 1.6016 - val_accuracy: 0.4168 - val_loss: 1.8113 - learning_rate: 5.0000e-04
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.4835 - loss: 1.5260 - val_accuracy: 0.4277 - val_loss: 1.7844 - learning_rate: 5.0000e-04
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.5113 - loss: 1.4596 - val_accuracy: 0.4341 - val_loss: 1.7768 - learning_rate: 5.0000e-04
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.5361 - loss: 1.3858 - val_accuracy: 0.4705 - val_loss: 1.7161 - learning_rate: 5.0000e-04
Epoch 16/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.5478 - loss: 1.3271 - val_accuracy: 0.4700 - val_loss: 1.7418 - learning_rate: 5.0000e-04
Epoch 17/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.5748 - loss: 1.2672 - val_accuracy: 0.5136 - val_loss: 1.5474 - learning_rate: 5.0000e-04
Epoch 18/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.5905 - loss: 1.2128 - val_accuracy: 0.5164 - val_loss: 1.5563 - learning_rate: 5.0000e-04
Epoch 19/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6147 - loss: 1.1438 - val_accuracy: 0.4586 - val_loss: 1.9316 - learning_rate: 5.0000e-04
Epoch 20/20
317/328 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.6301 - loss: 1.0988
Epoch 20: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6304 - loss: 1.0982 - val_accuracy: 0.5132 - val_loss: 1.6573 - learning_rate: 5.0000e-04
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 50ms/step - accuracy: 0.1012 - loss: 2.3922 - val_accuracy: 0.1623 - val_loss: 2.3155 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 16s 42ms/step - accuracy: 0.1653 - loss: 2.2837 - val_accuracy: 0.1909 - val_loss: 2.2069 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 42ms/step - accuracy: 0.2239 - loss: 2.1836 - val_accuracy: 0.2695 - val_loss: 2.0757 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 42ms/step - accuracy: 0.2575 - loss: 2.0939 - val_accuracy: 0.3109 - val_loss: 2.0094 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 44ms/step - accuracy: 0.2909 - loss: 2.0366 - val_accuracy: 0.3641 - val_loss: 1.8792 - learning_rate: 0.0010
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 44ms/step - accuracy: 0.3298 - loss: 1.9410 - val_accuracy: 0.4159 - val_loss: 1.6898 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 42ms/step - accuracy: 0.4020 - loss: 1.7414 - val_accuracy: 0.5055 - val_loss: 1.4599 - learning_rate: 0.0010
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 21s 43ms/step - accuracy: 0.4576 - loss: 1.5496 - val_accuracy: 0.5486 - val_loss: 1.2989 - learning_rate: 0.0010
Epoch 9/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 43ms/step - accuracy: 0.5100 - loss: 1.4018 - val_accuracy: 0.5927 - val_loss: 1.1631 - learning_rate: 0.0010
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 43ms/step - accuracy: 0.5409 - loss: 1.3181 - val_accuracy: 0.6114 - val_loss: 1.1246 - learning_rate: 0.0010
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 42ms/step - accuracy: 0.5745 - loss: 1.2204 - val_accuracy: 0.6705 - val_loss: 0.9714 - learning_rate: 0.0010
Epoch 12/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 43ms/step - accuracy: 0.5964 - loss: 1.1551 - val_accuracy: 0.6568 - val_loss: 1.0049 - learning_rate: 0.0010
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 42ms/step - accuracy: 0.6231 - loss: 1.0560 - val_accuracy: 0.6868 - val_loss: 0.8748 - learning_rate: 0.0010
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 42ms/step - accuracy: 0.6443 - loss: 1.0096 - val_accuracy: 0.6727 - val_loss: 0.9522 - learning_rate: 0.0010
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 21s 44ms/step - accuracy: 0.6679 - loss: 0.9486 - val_accuracy: 0.6695 - val_loss: 1.0153 - learning_rate: 0.0010
Epoch 16/20
327/328 ━━━━━━━━━━━━━━━━━━━━ 0s 40ms/step - accuracy: 0.6970 - loss: 0.8981
Epoch 16: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 43ms/step - accuracy: 0.6970 - loss: 0.8980 - val_accuracy: 0.6936 - val_loss: 0.9360 - learning_rate: 0.0010

Again, we noticed that the model performed significantly worse when feeded with augmented data.

3. Mini-Resnet-inspired Model¶

  1. Mini-Resnet-inspired Model
  • Starts with a base convolution followed by custom residual blocks.

  • Each residual block includes: Two Conv2D layers, BatchNormalization

  • Skip (identity) connections to avoid vanishing gradients and encourage gradient flow.

  • Employs MaxPooling2D and GlobalAveragePooling2D to reduce computation.

  • Final classification layer: Dense(64) + Dropout to softmax.

In [ ]:
def residual_block(x, filters):
    shortcut = x
    x = tf.keras.layers.Conv2D(filters, (3, 3), padding='same', activation='relu')(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Conv2D(filters, (3, 3), padding='same')(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Add()([x, shortcut])
    x = tf.keras.layers.Activation('relu')(x)
    return x



def mini_resnet(input_shape=(23, 23, 1), num_classes=11):
    inputs = tf.keras.Input(shape=input_shape)
    x = tf.keras.layers.Conv2D(32, (3, 3), padding='same', activation='relu')(inputs)
    x = tf.keras.layers.BatchNormalization()(x)

    x = residual_block(x, 32)
    x = tf.keras.layers.MaxPooling2D(pool_size=(1, 2))(x)

    x = tf.keras.layers.Conv2D(64, (3, 3), padding='same', activation='relu')(x)
    x = residual_block(x, 64)
    x = tf.keras.layers.GlobalAveragePooling2D()(x)

    x = tf.keras.layers.Dense(64, activation='relu')(x)
    x = tf.keras.layers.Dropout(0.5)(x)
    outputs = tf.keras.layers.Dense(num_classes, activation='softmax')(x)

    return tf.keras.Model(inputs, outputs)
In [ ]:
# ------------------------------Small------------------------------
small_resnet_model = mini_resnet()
small_resnet_model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

small_resnet_model.summary()

# ------------------------------Large------------------------------
large_resnet_model = mini_resnet(input_shape=(101, 101, 1))
large_resnet_model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

large_resnet_model.summary()
Model: "functional_50"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)        ┃ Output Shape      ┃    Param # ┃ Connected to      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ input_layer_50      │ (None, 23, 23, 1) │          0 │ -                 │
│ (InputLayer)        │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_136 (Conv2D) │ (None, 23, 23,    │        320 │ input_layer_50[0… │
│                     │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 23,    │        128 │ conv2d_136[0][0]  │
│ (BatchNormalizatio… │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_137 (Conv2D) │ (None, 23, 23,    │      9,248 │ batch_normalizat… │
│                     │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 23,    │        128 │ conv2d_137[0][0]  │
│ (BatchNormalizatio… │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_138 (Conv2D) │ (None, 23, 23,    │      9,248 │ batch_normalizat… │
│                     │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 23,    │        128 │ conv2d_138[0][0]  │
│ (BatchNormalizatio… │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ add_8 (Add)         │ (None, 23, 23,    │          0 │ batch_normalizat… │
│                     │ 32)               │            │ batch_normalizat… │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ activation_8        │ (None, 23, 23,    │          0 │ add_8[0][0]       │
│ (Activation)        │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ max_pooling2d_36    │ (None, 23, 11,    │          0 │ activation_8[0][… │
│ (MaxPooling2D)      │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_139 (Conv2D) │ (None, 23, 11,    │     18,496 │ max_pooling2d_36… │
│                     │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_140 (Conv2D) │ (None, 23, 11,    │     36,928 │ conv2d_139[0][0]  │
│                     │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 11,    │        256 │ conv2d_140[0][0]  │
│ (BatchNormalizatio… │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_141 (Conv2D) │ (None, 23, 11,    │     36,928 │ batch_normalizat… │
│                     │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 11,    │        256 │ conv2d_141[0][0]  │
│ (BatchNormalizatio… │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ add_9 (Add)         │ (None, 23, 11,    │          0 │ batch_normalizat… │
│                     │ 64)               │            │ conv2d_139[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ activation_9        │ (None, 23, 11,    │          0 │ add_9[0][0]       │
│ (Activation)        │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ global_average_poo… │ (None, 64)        │          0 │ activation_9[0][… │
│ (GlobalAveragePool… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_75 (Dense)    │ (None, 64)        │      4,160 │ global_average_p… │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_50          │ (None, 64)        │          0 │ dense_75[0][0]    │
│ (Dropout)           │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_76 (Dense)    │ (None, 11)        │        715 │ dropout_50[0][0]  │
└─────────────────────┴───────────────────┴────────────┴───────────────────┘
 Total params: 116,939 (456.79 KB)
 Trainable params: 116,491 (455.04 KB)
 Non-trainable params: 448 (1.75 KB)
Model: "functional_51"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)        ┃ Output Shape      ┃    Param # ┃ Connected to      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ input_layer_51      │ (None, 101, 101,  │          0 │ -                 │
│ (InputLayer)        │ 1)                │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_142 (Conv2D) │ (None, 101, 101,  │        320 │ input_layer_51[0… │
│                     │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 101,  │        128 │ conv2d_142[0][0]  │
│ (BatchNormalizatio… │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_143 (Conv2D) │ (None, 101, 101,  │      9,248 │ batch_normalizat… │
│                     │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 101,  │        128 │ conv2d_143[0][0]  │
│ (BatchNormalizatio… │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_144 (Conv2D) │ (None, 101, 101,  │      9,248 │ batch_normalizat… │
│                     │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 101,  │        128 │ conv2d_144[0][0]  │
│ (BatchNormalizatio… │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ add_10 (Add)        │ (None, 101, 101,  │          0 │ batch_normalizat… │
│                     │ 32)               │            │ batch_normalizat… │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ activation_10       │ (None, 101, 101,  │          0 │ add_10[0][0]      │
│ (Activation)        │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ max_pooling2d_37    │ (None, 101, 50,   │          0 │ activation_10[0]… │
│ (MaxPooling2D)      │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_145 (Conv2D) │ (None, 101, 50,   │     18,496 │ max_pooling2d_37… │
│                     │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_146 (Conv2D) │ (None, 101, 50,   │     36,928 │ conv2d_145[0][0]  │
│                     │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 50,   │        256 │ conv2d_146[0][0]  │
│ (BatchNormalizatio… │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_147 (Conv2D) │ (None, 101, 50,   │     36,928 │ batch_normalizat… │
│                     │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 50,   │        256 │ conv2d_147[0][0]  │
│ (BatchNormalizatio… │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ add_11 (Add)        │ (None, 101, 50,   │          0 │ batch_normalizat… │
│                     │ 64)               │            │ conv2d_145[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ activation_11       │ (None, 101, 50,   │          0 │ add_11[0][0]      │
│ (Activation)        │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ global_average_poo… │ (None, 64)        │          0 │ activation_11[0]… │
│ (GlobalAveragePool… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_77 (Dense)    │ (None, 64)        │      4,160 │ global_average_p… │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_51          │ (None, 64)        │          0 │ dense_77[0][0]    │
│ (Dropout)           │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_78 (Dense)    │ (None, 11)        │        715 │ dropout_51[0][0]  │
└─────────────────────┴───────────────────┴────────────┴───────────────────┘
 Total params: 116,939 (456.79 KB)
 Trainable params: 116,491 (455.04 KB)
 Non-trainable params: 448 (1.75 KB)
In [ ]:
# ------------------------------Small------------------------------
small_resnet_history = small_resnet_model.fit(
    small_train,
    validation_data=small_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

small_history_dict['ResNet50'] = small_resnet_history.history

# ------------------------------Large------------------------------
large_resnet_history = large_resnet_model.fit(
    large_train,
    validation_data=large_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

large_history_dict['ResNet50'] = large_resnet_history.history
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 17ms/step - accuracy: 0.2208 - loss: 2.2043 - val_accuracy: 0.0909 - val_loss: 4.3744 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.4300 - loss: 1.6250 - val_accuracy: 0.2655 - val_loss: 2.8556 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.5792 - loss: 1.2778 - val_accuracy: 0.5591 - val_loss: 1.2504 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.6611 - loss: 1.0438 - val_accuracy: 0.4414 - val_loss: 1.8652 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7213 - loss: 0.8762 - val_accuracy: 0.6573 - val_loss: 1.0443 - learning_rate: 0.0010
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7715 - loss: 0.7279 - val_accuracy: 0.6305 - val_loss: 1.0817 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7905 - loss: 0.6701 - val_accuracy: 0.6527 - val_loss: 1.3953 - learning_rate: 0.0010
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8194 - loss: 0.5846 - val_accuracy: 0.6709 - val_loss: 1.0047 - learning_rate: 0.0010
Epoch 9/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8498 - loss: 0.4996 - val_accuracy: 0.7418 - val_loss: 0.9050 - learning_rate: 0.0010
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8611 - loss: 0.4382 - val_accuracy: 0.6764 - val_loss: 0.9837 - learning_rate: 0.0010
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8826 - loss: 0.3791 - val_accuracy: 0.7605 - val_loss: 0.8136 - learning_rate: 0.0010
Epoch 12/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8874 - loss: 0.3647 - val_accuracy: 0.7709 - val_loss: 0.7217 - learning_rate: 0.0010
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.9061 - loss: 0.3135 - val_accuracy: 0.8255 - val_loss: 0.5675 - learning_rate: 0.0010
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9169 - loss: 0.2681 - val_accuracy: 0.8495 - val_loss: 0.5075 - learning_rate: 0.0010
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9249 - loss: 0.2468 - val_accuracy: 0.7000 - val_loss: 1.2336 - learning_rate: 0.0010
Epoch 16/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9223 - loss: 0.2415 - val_accuracy: 0.8409 - val_loss: 0.5300 - learning_rate: 0.0010
Epoch 17/20
322/328 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9406 - loss: 0.2046
Epoch 17: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.9405 - loss: 0.2048 - val_accuracy: 0.7345 - val_loss: 1.2525 - learning_rate: 0.0010
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 30s 69ms/step - accuracy: 0.2482 - loss: 2.1284 - val_accuracy: 0.0909 - val_loss: 4.9257 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 34s 59ms/step - accuracy: 0.4549 - loss: 1.5550 - val_accuracy: 0.1555 - val_loss: 3.9594 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 59ms/step - accuracy: 0.5697 - loss: 1.2446 - val_accuracy: 0.4791 - val_loss: 1.4131 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.6737 - loss: 0.9698 - val_accuracy: 0.6436 - val_loss: 1.1558 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 57ms/step - accuracy: 0.7348 - loss: 0.8067 - val_accuracy: 0.6427 - val_loss: 0.9848 - learning_rate: 0.0010
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.7861 - loss: 0.6780 - val_accuracy: 0.7323 - val_loss: 0.9397 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 21s 58ms/step - accuracy: 0.8310 - loss: 0.5419 - val_accuracy: 0.5468 - val_loss: 1.5731 - learning_rate: 0.0010
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.8548 - loss: 0.4666 - val_accuracy: 0.5991 - val_loss: 1.4657 - learning_rate: 0.0010
Epoch 9/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 55ms/step - accuracy: 0.8888 - loss: 0.3764
Epoch 9: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.8888 - loss: 0.3764 - val_accuracy: 0.4982 - val_loss: 2.2022 - learning_rate: 0.0010
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9161 - loss: 0.2873 - val_accuracy: 0.8527 - val_loss: 0.4331 - learning_rate: 5.0000e-04
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 57ms/step - accuracy: 0.9276 - loss: 0.2569 - val_accuracy: 0.9450 - val_loss: 0.1915 - learning_rate: 5.0000e-04
Epoch 12/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9284 - loss: 0.2361 - val_accuracy: 0.8805 - val_loss: 0.4103 - learning_rate: 5.0000e-04
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9352 - loss: 0.2188 - val_accuracy: 0.9564 - val_loss: 0.1487 - learning_rate: 5.0000e-04
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 59ms/step - accuracy: 0.9464 - loss: 0.1962 - val_accuracy: 0.7986 - val_loss: 0.6648 - learning_rate: 5.0000e-04
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 58ms/step - accuracy: 0.9512 - loss: 0.1854 - val_accuracy: 0.8955 - val_loss: 0.3156 - learning_rate: 5.0000e-04
Epoch 16/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 55ms/step - accuracy: 0.9439 - loss: 0.1877
Epoch 16: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9439 - loss: 0.1877 - val_accuracy: 0.9318 - val_loss: 0.2156 - learning_rate: 5.0000e-04
In [ ]:
# ------------------------------Small------------------------------
aug_small_resnet_model = mini_resnet()
aug_small_resnet_model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

aug_small_resnet_model.summary()


# ------------------------------Large------------------------------
aug_large_resnet_model = mini_resnet(input_shape=(101, 101, 1))
aug_large_resnet_model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

aug_large_resnet_model.summary()
Model: "functional_52"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)        ┃ Output Shape      ┃    Param # ┃ Connected to      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ input_layer_52      │ (None, 23, 23, 1) │          0 │ -                 │
│ (InputLayer)        │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_148 (Conv2D) │ (None, 23, 23,    │        320 │ input_layer_52[0… │
│                     │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 23,    │        128 │ conv2d_148[0][0]  │
│ (BatchNormalizatio… │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_149 (Conv2D) │ (None, 23, 23,    │      9,248 │ batch_normalizat… │
│                     │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 23,    │        128 │ conv2d_149[0][0]  │
│ (BatchNormalizatio… │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_150 (Conv2D) │ (None, 23, 23,    │      9,248 │ batch_normalizat… │
│                     │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 23,    │        128 │ conv2d_150[0][0]  │
│ (BatchNormalizatio… │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ add_12 (Add)        │ (None, 23, 23,    │          0 │ batch_normalizat… │
│                     │ 32)               │            │ batch_normalizat… │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ activation_12       │ (None, 23, 23,    │          0 │ add_12[0][0]      │
│ (Activation)        │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ max_pooling2d_38    │ (None, 23, 11,    │          0 │ activation_12[0]… │
│ (MaxPooling2D)      │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_151 (Conv2D) │ (None, 23, 11,    │     18,496 │ max_pooling2d_38… │
│                     │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_152 (Conv2D) │ (None, 23, 11,    │     36,928 │ conv2d_151[0][0]  │
│                     │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 11,    │        256 │ conv2d_152[0][0]  │
│ (BatchNormalizatio… │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_153 (Conv2D) │ (None, 23, 11,    │     36,928 │ batch_normalizat… │
│                     │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 11,    │        256 │ conv2d_153[0][0]  │
│ (BatchNormalizatio… │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ add_13 (Add)        │ (None, 23, 11,    │          0 │ batch_normalizat… │
│                     │ 64)               │            │ conv2d_151[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ activation_13       │ (None, 23, 11,    │          0 │ add_13[0][0]      │
│ (Activation)        │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ global_average_poo… │ (None, 64)        │          0 │ activation_13[0]… │
│ (GlobalAveragePool… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_79 (Dense)    │ (None, 64)        │      4,160 │ global_average_p… │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_52          │ (None, 64)        │          0 │ dense_79[0][0]    │
│ (Dropout)           │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_80 (Dense)    │ (None, 11)        │        715 │ dropout_52[0][0]  │
└─────────────────────┴───────────────────┴────────────┴───────────────────┘
 Total params: 116,939 (456.79 KB)
 Trainable params: 116,491 (455.04 KB)
 Non-trainable params: 448 (1.75 KB)
Model: "functional_53"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)        ┃ Output Shape      ┃    Param # ┃ Connected to      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ input_layer_53      │ (None, 101, 101,  │          0 │ -                 │
│ (InputLayer)        │ 1)                │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_154 (Conv2D) │ (None, 101, 101,  │        320 │ input_layer_53[0… │
│                     │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 101,  │        128 │ conv2d_154[0][0]  │
│ (BatchNormalizatio… │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_155 (Conv2D) │ (None, 101, 101,  │      9,248 │ batch_normalizat… │
│                     │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 101,  │        128 │ conv2d_155[0][0]  │
│ (BatchNormalizatio… │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_156 (Conv2D) │ (None, 101, 101,  │      9,248 │ batch_normalizat… │
│                     │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 101,  │        128 │ conv2d_156[0][0]  │
│ (BatchNormalizatio… │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ add_14 (Add)        │ (None, 101, 101,  │          0 │ batch_normalizat… │
│                     │ 32)               │            │ batch_normalizat… │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ activation_14       │ (None, 101, 101,  │          0 │ add_14[0][0]      │
│ (Activation)        │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ max_pooling2d_39    │ (None, 101, 50,   │          0 │ activation_14[0]… │
│ (MaxPooling2D)      │ 32)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_157 (Conv2D) │ (None, 101, 50,   │     18,496 │ max_pooling2d_39… │
│                     │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_158 (Conv2D) │ (None, 101, 50,   │     36,928 │ conv2d_157[0][0]  │
│                     │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 50,   │        256 │ conv2d_158[0][0]  │
│ (BatchNormalizatio… │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_159 (Conv2D) │ (None, 101, 50,   │     36,928 │ batch_normalizat… │
│                     │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 50,   │        256 │ conv2d_159[0][0]  │
│ (BatchNormalizatio… │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ add_15 (Add)        │ (None, 101, 50,   │          0 │ batch_normalizat… │
│                     │ 64)               │            │ conv2d_157[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ activation_15       │ (None, 101, 50,   │          0 │ add_15[0][0]      │
│ (Activation)        │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ global_average_poo… │ (None, 64)        │          0 │ activation_15[0]… │
│ (GlobalAveragePool… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_81 (Dense)    │ (None, 64)        │      4,160 │ global_average_p… │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_53          │ (None, 64)        │          0 │ dense_81[0][0]    │
│ (Dropout)           │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_82 (Dense)    │ (None, 11)        │        715 │ dropout_53[0][0]  │
└─────────────────────┴───────────────────┴────────────┴───────────────────┘
 Total params: 116,939 (456.79 KB)
 Trainable params: 116,491 (455.04 KB)
 Non-trainable params: 448 (1.75 KB)
In [ ]:
# ------------------------------Small------------------------------
aug_small_resnet_history = aug_small_resnet_model.fit(
    small_train,
    validation_data=small_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

small_history_dict['ResNet50 with Augmented Data'] = aug_small_resnet_history.history  # Save training history


# ------------------------------Large------------------------------
aug_large_resnet_history = aug_large_resnet_model.fit(
    large_train,
    validation_data=large_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

large_history_dict['ResNet50 with Augmented Data'] = aug_large_resnet_history.history  # Save training history
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 21ms/step - accuracy: 0.2185 - loss: 2.1924 - val_accuracy: 0.0909 - val_loss: 12.4494 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.4811 - loss: 1.5481 - val_accuracy: 0.3559 - val_loss: 1.8525 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6027 - loss: 1.1984 - val_accuracy: 0.5005 - val_loss: 1.6080 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6699 - loss: 0.9985 - val_accuracy: 0.4464 - val_loss: 1.7087 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7331 - loss: 0.8312 - val_accuracy: 0.5895 - val_loss: 1.1781 - learning_rate: 0.0010
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7813 - loss: 0.6913 - val_accuracy: 0.6277 - val_loss: 1.3663 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.8091 - loss: 0.5971 - val_accuracy: 0.8041 - val_loss: 0.5582 - learning_rate: 0.0010
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8308 - loss: 0.5490 - val_accuracy: 0.7645 - val_loss: 0.7376 - learning_rate: 0.0010
Epoch 9/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8588 - loss: 0.4605 - val_accuracy: 0.7586 - val_loss: 0.7394 - learning_rate: 0.0010
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8559 - loss: 0.4432
Epoch 10: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8559 - loss: 0.4432 - val_accuracy: 0.7727 - val_loss: 0.7658 - learning_rate: 0.0010
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9135 - loss: 0.2859 - val_accuracy: 0.8259 - val_loss: 0.5543 - learning_rate: 5.0000e-04
Epoch 12/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9357 - loss: 0.2306 - val_accuracy: 0.8818 - val_loss: 0.3875 - learning_rate: 5.0000e-04
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9387 - loss: 0.2088 - val_accuracy: 0.8850 - val_loss: 0.3903 - learning_rate: 5.0000e-04
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.9435 - loss: 0.1888 - val_accuracy: 0.8391 - val_loss: 0.6033 - learning_rate: 5.0000e-04
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.9531 - loss: 0.1693 - val_accuracy: 0.8950 - val_loss: 0.3598 - learning_rate: 5.0000e-04
Epoch 16/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9580 - loss: 0.1502 - val_accuracy: 0.8645 - val_loss: 0.4930 - learning_rate: 5.0000e-04
Epoch 17/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9585 - loss: 0.1432 - val_accuracy: 0.8805 - val_loss: 0.4434 - learning_rate: 5.0000e-04
Epoch 18/20
321/328 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9610 - loss: 0.1354
Epoch 18: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.9609 - loss: 0.1356 - val_accuracy: 0.8300 - val_loss: 0.7313 - learning_rate: 5.0000e-04
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 31s 72ms/step - accuracy: 0.2698 - loss: 2.0492 - val_accuracy: 0.0909 - val_loss: 3.2624 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.4920 - loss: 1.4496 - val_accuracy: 0.3668 - val_loss: 2.0656 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.6382 - loss: 1.0860 - val_accuracy: 0.3450 - val_loss: 2.1705 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.7303 - loss: 0.8487 - val_accuracy: 0.7386 - val_loss: 0.8934 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.7915 - loss: 0.6768 - val_accuracy: 0.7350 - val_loss: 0.8019 - learning_rate: 0.0010
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 57ms/step - accuracy: 0.8264 - loss: 0.5428 - val_accuracy: 0.7468 - val_loss: 0.8691 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.8560 - loss: 0.4682 - val_accuracy: 0.6136 - val_loss: 1.1981 - learning_rate: 0.0010
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.8768 - loss: 0.4209 - val_accuracy: 0.8618 - val_loss: 0.4029 - learning_rate: 0.0010
Epoch 9/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 58ms/step - accuracy: 0.8954 - loss: 0.3431 - val_accuracy: 0.8009 - val_loss: 0.6974 - learning_rate: 0.0010
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9127 - loss: 0.2976 - val_accuracy: 0.6545 - val_loss: 1.2640 - learning_rate: 0.0010
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 55ms/step - accuracy: 0.9128 - loss: 0.2907
Epoch 11: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9128 - loss: 0.2908 - val_accuracy: 0.6964 - val_loss: 1.0517 - learning_rate: 0.0010
Epoch 12/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9474 - loss: 0.1988 - val_accuracy: 0.9109 - val_loss: 0.3740 - learning_rate: 5.0000e-04
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 58ms/step - accuracy: 0.9515 - loss: 0.1708 - val_accuracy: 0.8995 - val_loss: 0.3655 - learning_rate: 5.0000e-04
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9570 - loss: 0.1620 - val_accuracy: 0.9291 - val_loss: 0.2467 - learning_rate: 5.0000e-04
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9527 - loss: 0.1556 - val_accuracy: 0.9464 - val_loss: 0.1847 - learning_rate: 5.0000e-04
Epoch 16/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9601 - loss: 0.1414 - val_accuracy: 0.9327 - val_loss: 0.2400 - learning_rate: 5.0000e-04
Epoch 17/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9554 - loss: 0.1522 - val_accuracy: 0.9600 - val_loss: 0.1463 - learning_rate: 5.0000e-04
Epoch 18/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 58ms/step - accuracy: 0.9640 - loss: 0.1262 - val_accuracy: 0.8118 - val_loss: 0.8915 - learning_rate: 5.0000e-04
Epoch 19/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 59ms/step - accuracy: 0.9655 - loss: 0.1256 - val_accuracy: 0.9727 - val_loss: 0.1055 - learning_rate: 5.0000e-04
Epoch 20/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9676 - loss: 0.1161 - val_accuracy: 0.9045 - val_loss: 0.3924 - learning_rate: 5.0000e-04

4. Mobilenet-Lite-inspired model¶

  1. Mobilenet-Lite-inspired model
  • Tailored for efficiency: uses SeparableConv2D (depthwise separable convolutions) to drastically reduce parameter count and computation.

  • Filter sizes progress from 32, 64, 128 across blocks.

  • Each block includes: SeparableConv2D, BatchNormalization, MaxPooling or GlobalAveragePooling.

  • Starts with Rescaling layer to normalize input pixels to [0, 1].

  • Ends with a compact Dense(64) + Dropout(0.3) to softmax.

In [ ]:
def mobilenet_lite(input_shape=(23, 23, 1), num_classes=11):
    model = tf.keras.Sequential([
        tf.keras.layers.Input(shape=input_shape),
        tf.keras.layers.Rescaling(1./255),

        tf.keras.layers.SeparableConv2D(32, (3, 3), padding='same', activation='relu'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.MaxPooling2D(pool_size=(1, 2)),

        tf.keras.layers.SeparableConv2D(64, (3, 3), padding='same', activation='relu'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),

        tf.keras.layers.SeparableConv2D(128, (3, 3), padding='same', activation='relu'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.GlobalAveragePooling2D(),

        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(num_classes, activation='softmax')
    ])
    return model
In [ ]:
# ------------------------------Small------------------------------
small_mobilenet_model = mobilenet_lite()
small_mobilenet_model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

small_mobilenet_model.summary()


# ------------------------------Large------------------------------
large_mobilenet_model = mobilenet_lite(input_shape=(101, 101, 1))
large_mobilenet_model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

large_mobilenet_model.summary()
Model: "sequential_38"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ rescaling (Rescaling)           │ (None, 23, 23, 1)      │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ separable_conv2d                │ (None, 23, 23, 32)     │            73 │
│ (SeparableConv2D)               │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_68          │ (None, 23, 23, 32)     │           128 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_24 (MaxPooling2D) │ (None, 23, 11, 32)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ separable_conv2d_1              │ (None, 23, 11, 64)     │         2,400 │
│ (SeparableConv2D)               │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_69          │ (None, 23, 11, 64)     │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_25 (MaxPooling2D) │ (None, 11, 5, 64)      │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ separable_conv2d_2              │ (None, 11, 5, 128)     │         8,896 │
│ (SeparableConv2D)               │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_70          │ (None, 11, 5, 128)     │           512 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling2d_18     │ (None, 128)            │             0 │
│ (GlobalAveragePooling2D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_59 (Dense)                │ (None, 64)             │         8,256 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_42 (Dropout)            │ (None, 64)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_60 (Dense)                │ (None, 11)             │           715 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 21,236 (82.95 KB)
 Trainable params: 20,788 (81.20 KB)
 Non-trainable params: 448 (1.75 KB)
Model: "sequential_39"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ rescaling_1 (Rescaling)         │ (None, 101, 101, 1)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ separable_conv2d_3              │ (None, 101, 101, 32)   │            73 │
│ (SeparableConv2D)               │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_71          │ (None, 101, 101, 32)   │           128 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_26 (MaxPooling2D) │ (None, 101, 50, 32)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ separable_conv2d_4              │ (None, 101, 50, 64)    │         2,400 │
│ (SeparableConv2D)               │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_72          │ (None, 101, 50, 64)    │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_27 (MaxPooling2D) │ (None, 50, 25, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ separable_conv2d_5              │ (None, 50, 25, 128)    │         8,896 │
│ (SeparableConv2D)               │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_73          │ (None, 50, 25, 128)    │           512 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling2d_19     │ (None, 128)            │             0 │
│ (GlobalAveragePooling2D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_61 (Dense)                │ (None, 64)             │         8,256 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_43 (Dropout)            │ (None, 64)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_62 (Dense)                │ (None, 11)             │           715 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 21,236 (82.95 KB)
 Trainable params: 20,788 (81.20 KB)
 Non-trainable params: 448 (1.75 KB)
In [ ]:
# ------------------------------Small------------------------------
small_mobilenet_history = small_mobilenet_model.fit(
    small_train,
    validation_data=small_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

small_history_dict['MobileNet'] = small_mobilenet_history.history


# ------------------------------Large------------------------------
large_mobilenet_history = large_mobilenet_model.fit(
    large_train,
    validation_data=large_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

large_history_dict['MobileNet'] = large_mobilenet_history.history
Epoch 1/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 12s 16ms/step - accuracy: 0.2149 - loss: 2.2140 - val_accuracy: 0.0909 - val_loss: 2.4556 - learning_rate: 0.0010
Epoch 2/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3719 - loss: 1.7944 - val_accuracy: 0.1182 - val_loss: 2.6646 - learning_rate: 0.0010
Epoch 3/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.4593 - loss: 1.5753 - val_accuracy: 0.2095 - val_loss: 2.9365 - learning_rate: 0.0010
Epoch 4/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.5228 - loss: 1.3817 - val_accuracy: 0.2432 - val_loss: 2.3082 - learning_rate: 0.0010
Epoch 5/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.5737 - loss: 1.2547 - val_accuracy: 0.3391 - val_loss: 2.3796 - learning_rate: 0.0010
Epoch 6/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.5959 - loss: 1.1978 - val_accuracy: 0.1314 - val_loss: 7.8764 - learning_rate: 0.0010
Epoch 7/30
319/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.6278 - loss: 1.1042
Epoch 7: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.6279 - loss: 1.1041 - val_accuracy: 0.3191 - val_loss: 2.7673 - learning_rate: 0.0010
Epoch 8/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6644 - loss: 0.9989 - val_accuracy: 0.4714 - val_loss: 1.6653 - learning_rate: 5.0000e-04
Epoch 9/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6630 - loss: 0.9911 - val_accuracy: 0.2505 - val_loss: 2.7174 - learning_rate: 5.0000e-04
Epoch 10/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6798 - loss: 0.9486 - val_accuracy: 0.2200 - val_loss: 6.9227 - learning_rate: 5.0000e-04
Epoch 11/30
315/328 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.6982 - loss: 0.9058
Epoch 11: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.6981 - loss: 0.9059 - val_accuracy: 0.0909 - val_loss: 15.9827 - learning_rate: 5.0000e-04
Epoch 12/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7132 - loss: 0.8591 - val_accuracy: 0.2618 - val_loss: 3.5090 - learning_rate: 2.5000e-04
Epoch 13/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7146 - loss: 0.8566 - val_accuracy: 0.4350 - val_loss: 1.8526 - learning_rate: 2.5000e-04
Epoch 14/30
316/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.7133 - loss: 0.8606
Epoch 14: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7134 - loss: 0.8598 - val_accuracy: 0.4341 - val_loss: 1.8461 - learning_rate: 2.5000e-04
Epoch 15/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7281 - loss: 0.8202 - val_accuracy: 0.5832 - val_loss: 1.2580 - learning_rate: 1.2500e-04
Epoch 16/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7290 - loss: 0.8125 - val_accuracy: 0.6923 - val_loss: 0.9420 - learning_rate: 1.2500e-04
Epoch 17/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7365 - loss: 0.7990 - val_accuracy: 0.3668 - val_loss: 3.1542 - learning_rate: 1.2500e-04
Epoch 18/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7368 - loss: 0.7989 - val_accuracy: 0.5677 - val_loss: 1.2543 - learning_rate: 1.2500e-04
Epoch 19/30
318/328 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.7329 - loss: 0.7918
Epoch 19: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7329 - loss: 0.7916 - val_accuracy: 0.5286 - val_loss: 1.4801 - learning_rate: 1.2500e-04
Epoch 1/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 16s 30ms/step - accuracy: 0.2471 - loss: 2.1513 - val_accuracy: 0.0909 - val_loss: 2.4715 - learning_rate: 0.0010
Epoch 2/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 12s 14ms/step - accuracy: 0.4029 - loss: 1.7278 - val_accuracy: 0.0914 - val_loss: 6.5415 - learning_rate: 0.0010
Epoch 3/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.5058 - loss: 1.4302 - val_accuracy: 0.1455 - val_loss: 8.3258 - learning_rate: 0.0010
Epoch 4/30
325/328 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.5647 - loss: 1.2593
Epoch 4: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.5648 - loss: 1.2590 - val_accuracy: 0.0909 - val_loss: 90.9654 - learning_rate: 0.0010
Epoch 5/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.6206 - loss: 1.1180 - val_accuracy: 0.2245 - val_loss: 4.8245 - learning_rate: 5.0000e-04
Epoch 6/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.6430 - loss: 1.0522 - val_accuracy: 0.2264 - val_loss: 8.3670 - learning_rate: 5.0000e-04
Epoch 7/30
325/328 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.6510 - loss: 1.0162
Epoch 7: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.6511 - loss: 1.0160 - val_accuracy: 0.1273 - val_loss: 22.6868 - learning_rate: 5.0000e-04
Epoch 8/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.6840 - loss: 0.9573 - val_accuracy: 0.4150 - val_loss: 1.7806 - learning_rate: 2.5000e-04
Epoch 9/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 4s 14ms/step - accuracy: 0.6873 - loss: 0.9237 - val_accuracy: 0.1227 - val_loss: 7.0741 - learning_rate: 2.5000e-04
Epoch 10/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.6919 - loss: 0.9039 - val_accuracy: 0.2059 - val_loss: 7.9217 - learning_rate: 2.5000e-04
Epoch 11/30
327/328 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.7032 - loss: 0.8893
Epoch 11: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 15ms/step - accuracy: 0.7033 - loss: 0.8892 - val_accuracy: 0.2609 - val_loss: 3.9283 - learning_rate: 2.5000e-04
Epoch 12/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 6s 17ms/step - accuracy: 0.7097 - loss: 0.8478 - val_accuracy: 0.3895 - val_loss: 3.2631 - learning_rate: 1.2500e-04
Epoch 13/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 9s 14ms/step - accuracy: 0.7233 - loss: 0.8342 - val_accuracy: 0.4359 - val_loss: 1.6465 - learning_rate: 1.2500e-04
Epoch 14/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 16ms/step - accuracy: 0.7294 - loss: 0.8297 - val_accuracy: 0.6632 - val_loss: 0.9992 - learning_rate: 1.2500e-04
Epoch 15/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 10s 14ms/step - accuracy: 0.7308 - loss: 0.8294 - val_accuracy: 0.4536 - val_loss: 1.7776 - learning_rate: 1.2500e-04
Epoch 16/30
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7238 - loss: 0.8322 - val_accuracy: 0.5450 - val_loss: 1.2672 - learning_rate: 1.2500e-04
Epoch 17/30
325/328 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.7232 - loss: 0.8129
Epoch 17: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
328/328 ━━━━━━━━━━━━━━━━━━━━ 4s 14ms/step - accuracy: 0.7232 - loss: 0.8128 - val_accuracy: 0.3795 - val_loss: 2.5536 - learning_rate: 1.2500e-04
In [ ]:
# ------------------------------Small------------------------------
aug_small_mobilenet_model = mobilenet_lite()
aug_small_mobilenet_model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

aug_small_mobilenet_model.summary()

# ------------------------------Large------------------------------
aug_large_mobilenet_model = mobilenet_lite(input_shape=(101, 101, 1))
aug_large_mobilenet_model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

aug_large_mobilenet_model.summary()
Model: "sequential_40"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ rescaling_2 (Rescaling)         │ (None, 23, 23, 1)      │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ separable_conv2d_6              │ (None, 23, 23, 32)     │            73 │
│ (SeparableConv2D)               │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_74          │ (None, 23, 23, 32)     │           128 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_28 (MaxPooling2D) │ (None, 23, 11, 32)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ separable_conv2d_7              │ (None, 23, 11, 64)     │         2,400 │
│ (SeparableConv2D)               │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_75          │ (None, 23, 11, 64)     │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_29 (MaxPooling2D) │ (None, 11, 5, 64)      │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ separable_conv2d_8              │ (None, 11, 5, 128)     │         8,896 │
│ (SeparableConv2D)               │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_76          │ (None, 11, 5, 128)     │           512 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling2d_20     │ (None, 128)            │             0 │
│ (GlobalAveragePooling2D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_63 (Dense)                │ (None, 64)             │         8,256 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_44 (Dropout)            │ (None, 64)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_64 (Dense)                │ (None, 11)             │           715 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 21,236 (82.95 KB)
 Trainable params: 20,788 (81.20 KB)
 Non-trainable params: 448 (1.75 KB)
Model: "sequential_41"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ rescaling_3 (Rescaling)         │ (None, 101, 101, 1)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ separable_conv2d_9              │ (None, 101, 101, 32)   │            73 │
│ (SeparableConv2D)               │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_77          │ (None, 101, 101, 32)   │           128 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_30 (MaxPooling2D) │ (None, 101, 50, 32)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ separable_conv2d_10             │ (None, 101, 50, 64)    │         2,400 │
│ (SeparableConv2D)               │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_78          │ (None, 101, 50, 64)    │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_31 (MaxPooling2D) │ (None, 50, 25, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ separable_conv2d_11             │ (None, 50, 25, 128)    │         8,896 │
│ (SeparableConv2D)               │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_79          │ (None, 50, 25, 128)    │           512 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling2d_21     │ (None, 128)            │             0 │
│ (GlobalAveragePooling2D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_65 (Dense)                │ (None, 64)             │         8,256 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_45 (Dropout)            │ (None, 64)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_66 (Dense)                │ (None, 11)             │           715 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 21,236 (82.95 KB)
 Trainable params: 20,788 (81.20 KB)
 Non-trainable params: 448 (1.75 KB)
In [ ]:
# ------------------------------Small------------------------------
aug_small_mobilenet_history = aug_small_mobilenet_model.fit(
    small_train,
    validation_data=small_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

small_history_dict['MobileNet with Augmented Data'] = aug_small_mobilenet_history.history


# ------------------------------Large------------------------------
aug_large_mobilenet_history = aug_large_mobilenet_model.fit(
    large_train,
    validation_data=large_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

large_history_dict['MobileNet with Augmented Data'] = aug_large_mobilenet_history.history
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 10s 16ms/step - accuracy: 0.2225 - loss: 2.1902 - val_accuracy: 0.0909 - val_loss: 2.4308 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3473 - loss: 1.8656 - val_accuracy: 0.1150 - val_loss: 4.0065 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.4346 - loss: 1.6383 - val_accuracy: 0.2936 - val_loss: 2.3334 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.4884 - loss: 1.4905 - val_accuracy: 0.2114 - val_loss: 4.8738 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.5347 - loss: 1.3611 - val_accuracy: 0.1623 - val_loss: 4.5721 - learning_rate: 0.0010
Epoch 6/20
321/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.5648 - loss: 1.2729
Epoch 6: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.5650 - loss: 1.2726 - val_accuracy: 0.0968 - val_loss: 17.3579 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.5962 - loss: 1.1879 - val_accuracy: 0.0905 - val_loss: 22.6661 - learning_rate: 5.0000e-04
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6145 - loss: 1.1455 - val_accuracy: 0.1295 - val_loss: 13.6056 - learning_rate: 5.0000e-04
Epoch 9/20
313/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.6216 - loss: 1.1104
Epoch 9: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.6215 - loss: 1.1103 - val_accuracy: 0.2245 - val_loss: 5.8888 - learning_rate: 5.0000e-04
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6353 - loss: 1.0613 - val_accuracy: 0.1832 - val_loss: 4.1779 - learning_rate: 2.5000e-04
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6488 - loss: 1.0499 - val_accuracy: 0.2500 - val_loss: 3.8112 - learning_rate: 2.5000e-04
Epoch 12/20
316/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.6490 - loss: 1.0331
Epoch 12: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6490 - loss: 1.0332 - val_accuracy: 0.1382 - val_loss: 9.2115 - learning_rate: 2.5000e-04
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6432 - loss: 1.0368 - val_accuracy: 0.6191 - val_loss: 1.1213 - learning_rate: 1.2500e-04
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.6657 - loss: 0.9905 - val_accuracy: 0.4255 - val_loss: 1.8480 - learning_rate: 1.2500e-04
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6654 - loss: 0.9999 - val_accuracy: 0.0909 - val_loss: 27.1148 - learning_rate: 1.2500e-04
Epoch 16/20
322/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.6733 - loss: 0.9744
Epoch 16: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6731 - loss: 0.9747 - val_accuracy: 0.1450 - val_loss: 42.2199 - learning_rate: 1.2500e-04
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 28ms/step - accuracy: 0.2256 - loss: 2.1789 - val_accuracy: 0.0909 - val_loss: 2.4568 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.3973 - loss: 1.7607 - val_accuracy: 0.1045 - val_loss: 5.2101 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 15ms/step - accuracy: 0.4876 - loss: 1.4821 - val_accuracy: 0.0909 - val_loss: 13.3821 - learning_rate: 0.0010
Epoch 4/20
325/328 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.5610 - loss: 1.2910
Epoch 4: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.5611 - loss: 1.2905 - val_accuracy: 0.2300 - val_loss: 3.4362 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.6232 - loss: 1.1182 - val_accuracy: 0.3105 - val_loss: 2.3407 - learning_rate: 5.0000e-04
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.6376 - loss: 1.0560 - val_accuracy: 0.5550 - val_loss: 1.1949 - learning_rate: 5.0000e-04
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.6590 - loss: 0.9879 - val_accuracy: 0.3632 - val_loss: 1.9190 - learning_rate: 5.0000e-04
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 15ms/step - accuracy: 0.6763 - loss: 0.9465 - val_accuracy: 0.3223 - val_loss: 2.8640 - learning_rate: 5.0000e-04
Epoch 9/20
325/328 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.7000 - loss: 0.8820
Epoch 9: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7000 - loss: 0.8820 - val_accuracy: 0.3686 - val_loss: 2.1724 - learning_rate: 5.0000e-04
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7074 - loss: 0.8467 - val_accuracy: 0.4118 - val_loss: 3.7875 - learning_rate: 2.5000e-04
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7287 - loss: 0.8173 - val_accuracy: 0.5345 - val_loss: 1.4572 - learning_rate: 2.5000e-04
Epoch 12/20
325/328 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.7374 - loss: 0.7773
Epoch 12: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7374 - loss: 0.7773 - val_accuracy: 0.5586 - val_loss: 1.3115 - learning_rate: 2.5000e-04
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7400 - loss: 0.7568 - val_accuracy: 0.7359 - val_loss: 0.7604 - learning_rate: 1.2500e-04
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7515 - loss: 0.7557 - val_accuracy: 0.3973 - val_loss: 2.3049 - learning_rate: 1.2500e-04
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7557 - loss: 0.7239 - val_accuracy: 0.7114 - val_loss: 0.8436 - learning_rate: 1.2500e-04
Epoch 16/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7640 - loss: 0.7311 - val_accuracy: 0.7441 - val_loss: 0.7242 - learning_rate: 1.2500e-04
Epoch 17/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7627 - loss: 0.7163 - val_accuracy: 0.4955 - val_loss: 1.7038 - learning_rate: 1.2500e-04
Epoch 18/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7656 - loss: 0.7222 - val_accuracy: 0.4186 - val_loss: 2.2651 - learning_rate: 1.2500e-04
Epoch 19/20
325/328 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.7631 - loss: 0.7192
Epoch 19: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 15ms/step - accuracy: 0.7631 - loss: 0.7191 - val_accuracy: 0.3082 - val_loss: 3.9644 - learning_rate: 1.2500e-04

5. Mini-Densenet-inspired model¶

  1. Mini-Densenet-inspired model
  • Begins with a Conv2D layer (16 filters) followed by two Dense Blocks.

  • Each Dense Block:

  • Contains 3 convolutional layers with growth rate = 12. Uses BatchNormalization to ReLU to Conv2D. Applies feature concatenation from all previous layers (key DenseNet trait).

  • Includes MaxPooling2D (1x2) after first block and GlobalAveragePooling2D at the end.

  • Final classifier: Dense(64) + Dropout(0.3) to softmax.

In [ ]:
def densenet_block(x, growth_rate, layers):
    for _ in range(layers):
        out = tf.keras.layers.BatchNormalization()(x)
        out = tf.keras.layers.ReLU()(out)
        out = tf.keras.layers.Conv2D(growth_rate, (3, 3), padding='same')(out)
        x = tf.keras.layers.Concatenate()([x, out])
    return x



def mini_densenet(input_shape=(23, 23, 1), num_classes=11):
    inputs = tf.keras.Input(shape=input_shape)
    x = tf.keras.layers.Conv2D(16, (3, 3), padding='same')(inputs)

    x = densenet_block(x, growth_rate=12, layers=3)
    x = tf.keras.layers.MaxPooling2D(pool_size=(1, 2))(x)

    x = densenet_block(x, growth_rate=12, layers=3)
    x = tf.keras.layers.GlobalAveragePooling2D()(x)

    x = tf.keras.layers.Dense(64, activation='relu')(x)
    x = tf.keras.layers.Dropout(0.3)(x)
    outputs = tf.keras.layers.Dense(num_classes, activation='softmax')(x)

    return tf.keras.Model(inputs, outputs)
In [ ]:
# ------------------------------Small------------------------------
small_densenet_model = mini_densenet()
small_densenet_model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

small_densenet_model.summary()

# ------------------------------Large------------------------------
large_densenet_model = mini_densenet(input_shape=(101, 101, 1))
large_densenet_model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

large_densenet_model.summary()
Model: "functional_46"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)        ┃ Output Shape      ┃    Param # ┃ Connected to      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ input_layer_46      │ (None, 23, 23, 1) │          0 │ -                 │
│ (InputLayer)        │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_108 (Conv2D) │ (None, 23, 23,    │        160 │ input_layer_46[0… │
│                     │ 16)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 23,    │         64 │ conv2d_108[0][0]  │
│ (BatchNormalizatio… │ 16)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu (ReLU)        │ (None, 23, 23,    │          0 │ batch_normalizat… │
│                     │ 16)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_109 (Conv2D) │ (None, 23, 23,    │      1,740 │ re_lu[0][0]       │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate         │ (None, 23, 23,    │          0 │ conv2d_108[0][0], │
│ (Concatenate)       │ 28)               │            │ conv2d_109[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 23,    │        112 │ concatenate[0][0] │
│ (BatchNormalizatio… │ 28)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_1 (ReLU)      │ (None, 23, 23,    │          0 │ batch_normalizat… │
│                     │ 28)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_110 (Conv2D) │ (None, 23, 23,    │      3,036 │ re_lu_1[0][0]     │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_1       │ (None, 23, 23,    │          0 │ concatenate[0][0… │
│ (Concatenate)       │ 40)               │            │ conv2d_110[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 23,    │        160 │ concatenate_1[0]… │
│ (BatchNormalizatio… │ 40)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_2 (ReLU)      │ (None, 23, 23,    │          0 │ batch_normalizat… │
│                     │ 40)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_111 (Conv2D) │ (None, 23, 23,    │      4,332 │ re_lu_2[0][0]     │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_2       │ (None, 23, 23,    │          0 │ concatenate_1[0]… │
│ (Concatenate)       │ 52)               │            │ conv2d_111[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ max_pooling2d_32    │ (None, 23, 11,    │          0 │ concatenate_2[0]… │
│ (MaxPooling2D)      │ 52)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 11,    │        208 │ max_pooling2d_32… │
│ (BatchNormalizatio… │ 52)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_3 (ReLU)      │ (None, 23, 11,    │          0 │ batch_normalizat… │
│                     │ 52)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_112 (Conv2D) │ (None, 23, 11,    │      5,628 │ re_lu_3[0][0]     │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_3       │ (None, 23, 11,    │          0 │ max_pooling2d_32… │
│ (Concatenate)       │ 64)               │            │ conv2d_112[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 11,    │        256 │ concatenate_3[0]… │
│ (BatchNormalizatio… │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_4 (ReLU)      │ (None, 23, 11,    │          0 │ batch_normalizat… │
│                     │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_113 (Conv2D) │ (None, 23, 11,    │      6,924 │ re_lu_4[0][0]     │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_4       │ (None, 23, 11,    │          0 │ concatenate_3[0]… │
│ (Concatenate)       │ 76)               │            │ conv2d_113[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 11,    │        304 │ concatenate_4[0]… │
│ (BatchNormalizatio… │ 76)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_5 (ReLU)      │ (None, 23, 11,    │          0 │ batch_normalizat… │
│                     │ 76)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_114 (Conv2D) │ (None, 23, 11,    │      8,220 │ re_lu_5[0][0]     │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_5       │ (None, 23, 11,    │          0 │ concatenate_4[0]… │
│ (Concatenate)       │ 88)               │            │ conv2d_114[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ global_average_poo… │ (None, 88)        │          0 │ concatenate_5[0]… │
│ (GlobalAveragePool… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_67 (Dense)    │ (None, 64)        │      5,696 │ global_average_p… │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_46          │ (None, 64)        │          0 │ dense_67[0][0]    │
│ (Dropout)           │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_68 (Dense)    │ (None, 11)        │        715 │ dropout_46[0][0]  │
└─────────────────────┴───────────────────┴────────────┴───────────────────┘
 Total params: 37,555 (146.70 KB)
 Trainable params: 37,003 (144.54 KB)
 Non-trainable params: 552 (2.16 KB)
Model: "functional_47"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)        ┃ Output Shape      ┃    Param # ┃ Connected to      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ input_layer_47      │ (None, 101, 101,  │          0 │ -                 │
│ (InputLayer)        │ 1)                │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_115 (Conv2D) │ (None, 101, 101,  │        160 │ input_layer_47[0… │
│                     │ 16)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 101,  │         64 │ conv2d_115[0][0]  │
│ (BatchNormalizatio… │ 16)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_6 (ReLU)      │ (None, 101, 101,  │          0 │ batch_normalizat… │
│                     │ 16)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_116 (Conv2D) │ (None, 101, 101,  │      1,740 │ re_lu_6[0][0]     │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_6       │ (None, 101, 101,  │          0 │ conv2d_115[0][0], │
│ (Concatenate)       │ 28)               │            │ conv2d_116[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 101,  │        112 │ concatenate_6[0]… │
│ (BatchNormalizatio… │ 28)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_7 (ReLU)      │ (None, 101, 101,  │          0 │ batch_normalizat… │
│                     │ 28)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_117 (Conv2D) │ (None, 101, 101,  │      3,036 │ re_lu_7[0][0]     │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_7       │ (None, 101, 101,  │          0 │ concatenate_6[0]… │
│ (Concatenate)       │ 40)               │            │ conv2d_117[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 101,  │        160 │ concatenate_7[0]… │
│ (BatchNormalizatio… │ 40)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_8 (ReLU)      │ (None, 101, 101,  │          0 │ batch_normalizat… │
│                     │ 40)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_118 (Conv2D) │ (None, 101, 101,  │      4,332 │ re_lu_8[0][0]     │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_8       │ (None, 101, 101,  │          0 │ concatenate_7[0]… │
│ (Concatenate)       │ 52)               │            │ conv2d_118[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ max_pooling2d_33    │ (None, 101, 50,   │          0 │ concatenate_8[0]… │
│ (MaxPooling2D)      │ 52)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 50,   │        208 │ max_pooling2d_33… │
│ (BatchNormalizatio… │ 52)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_9 (ReLU)      │ (None, 101, 50,   │          0 │ batch_normalizat… │
│                     │ 52)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_119 (Conv2D) │ (None, 101, 50,   │      5,628 │ re_lu_9[0][0]     │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_9       │ (None, 101, 50,   │          0 │ max_pooling2d_33… │
│ (Concatenate)       │ 64)               │            │ conv2d_119[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 50,   │        256 │ concatenate_9[0]… │
│ (BatchNormalizatio… │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_10 (ReLU)     │ (None, 101, 50,   │          0 │ batch_normalizat… │
│                     │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_120 (Conv2D) │ (None, 101, 50,   │      6,924 │ re_lu_10[0][0]    │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_10      │ (None, 101, 50,   │          0 │ concatenate_9[0]… │
│ (Concatenate)       │ 76)               │            │ conv2d_120[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 50,   │        304 │ concatenate_10[0… │
│ (BatchNormalizatio… │ 76)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_11 (ReLU)     │ (None, 101, 50,   │          0 │ batch_normalizat… │
│                     │ 76)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_121 (Conv2D) │ (None, 101, 50,   │      8,220 │ re_lu_11[0][0]    │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_11      │ (None, 101, 50,   │          0 │ concatenate_10[0… │
│ (Concatenate)       │ 88)               │            │ conv2d_121[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ global_average_poo… │ (None, 88)        │          0 │ concatenate_11[0… │
│ (GlobalAveragePool… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_69 (Dense)    │ (None, 64)        │      5,696 │ global_average_p… │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_47          │ (None, 64)        │          0 │ dense_69[0][0]    │
│ (Dropout)           │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_70 (Dense)    │ (None, 11)        │        715 │ dropout_47[0][0]  │
└─────────────────────┴───────────────────┴────────────┴───────────────────┘
 Total params: 37,555 (146.70 KB)
 Trainable params: 37,003 (144.54 KB)
 Non-trainable params: 552 (2.16 KB)
In [ ]:
# ------------------------------Small------------------------------
small_densenet_history = small_densenet_model.fit(
    small_train,
    validation_data=small_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

small_history_dict['DenseNet'] = small_densenet_history.history

# ------------------------------Large------------------------------
large_densenet_history = large_densenet_model.fit(
    large_train,
    validation_data=large_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

large_history_dict['DenseNet'] = large_densenet_history.history
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 18s 29ms/step - accuracy: 0.1992 - loss: 2.2404 - val_accuracy: 0.0909 - val_loss: 5.9046 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.3300 - loss: 1.8944 - val_accuracy: 0.2041 - val_loss: 3.4966 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.4359 - loss: 1.6202 - val_accuracy: 0.4418 - val_loss: 1.6227 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.4902 - loss: 1.4498 - val_accuracy: 0.2582 - val_loss: 2.7828 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.5421 - loss: 1.3194 - val_accuracy: 0.5755 - val_loss: 1.2331 - learning_rate: 0.0010
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.5631 - loss: 1.2467 - val_accuracy: 0.3141 - val_loss: 2.3984 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.5881 - loss: 1.1691 - val_accuracy: 0.2750 - val_loss: 6.2530 - learning_rate: 0.0010
Epoch 8/20
319/328 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.6199 - loss: 1.1219
Epoch 8: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6203 - loss: 1.1206 - val_accuracy: 0.3900 - val_loss: 2.1949 - learning_rate: 0.0010
Epoch 9/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6743 - loss: 0.9559 - val_accuracy: 0.5814 - val_loss: 1.2654 - learning_rate: 5.0000e-04
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6864 - loss: 0.9254 - val_accuracy: 0.5464 - val_loss: 1.2873 - learning_rate: 5.0000e-04
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6914 - loss: 0.8922 - val_accuracy: 0.6073 - val_loss: 1.1625 - learning_rate: 5.0000e-04
Epoch 12/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7005 - loss: 0.8620 - val_accuracy: 0.7427 - val_loss: 0.7916 - learning_rate: 5.0000e-04
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7095 - loss: 0.8525 - val_accuracy: 0.6555 - val_loss: 1.0677 - learning_rate: 5.0000e-04
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7252 - loss: 0.8034 - val_accuracy: 0.7382 - val_loss: 0.7487 - learning_rate: 5.0000e-04
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7186 - loss: 0.8082 - val_accuracy: 0.6409 - val_loss: 1.1873 - learning_rate: 5.0000e-04
Epoch 16/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7427 - loss: 0.7606 - val_accuracy: 0.6200 - val_loss: 1.1299 - learning_rate: 5.0000e-04
Epoch 17/20
323/328 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.7413 - loss: 0.7479
Epoch 17: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7413 - loss: 0.7479 - val_accuracy: 0.6768 - val_loss: 0.9252 - learning_rate: 5.0000e-04
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 39s 82ms/step - accuracy: 0.2171 - loss: 2.1935 - val_accuracy: 0.0909 - val_loss: 8.1020 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 47ms/step - accuracy: 0.3760 - loss: 1.7215 - val_accuracy: 0.3018 - val_loss: 2.0443 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.4666 - loss: 1.4835 - val_accuracy: 0.1886 - val_loss: 6.5382 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.4930 - loss: 1.4046 - val_accuracy: 0.4291 - val_loss: 1.7117 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.5306 - loss: 1.2862 - val_accuracy: 0.4023 - val_loss: 1.8823 - learning_rate: 0.0010
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 21s 46ms/step - accuracy: 0.5865 - loss: 1.1891 - val_accuracy: 0.4055 - val_loss: 1.6295 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.6067 - loss: 1.0852 - val_accuracy: 0.5636 - val_loss: 1.2996 - learning_rate: 0.0010
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 47ms/step - accuracy: 0.6254 - loss: 1.0600 - val_accuracy: 0.5659 - val_loss: 1.2756 - learning_rate: 0.0010
Epoch 9/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.6571 - loss: 0.9766 - val_accuracy: 0.6955 - val_loss: 0.8901 - learning_rate: 0.0010
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 45ms/step - accuracy: 0.6663 - loss: 0.9188 - val_accuracy: 0.4073 - val_loss: 2.1295 - learning_rate: 0.0010
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 21s 46ms/step - accuracy: 0.6958 - loss: 0.8733 - val_accuracy: 0.4500 - val_loss: 2.7203 - learning_rate: 0.0010
Epoch 12/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.7168 - loss: 0.8161
Epoch 12: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 16s 47ms/step - accuracy: 0.7168 - loss: 0.8161 - val_accuracy: 0.6377 - val_loss: 1.0256 - learning_rate: 0.0010
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.7770 - loss: 0.6596 - val_accuracy: 0.7573 - val_loss: 0.7120 - learning_rate: 5.0000e-04
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 46ms/step - accuracy: 0.7799 - loss: 0.6523 - val_accuracy: 0.7664 - val_loss: 0.6801 - learning_rate: 5.0000e-04
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 16s 47ms/step - accuracy: 0.7875 - loss: 0.6309 - val_accuracy: 0.6327 - val_loss: 1.3623 - learning_rate: 5.0000e-04
Epoch 16/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 46ms/step - accuracy: 0.7987 - loss: 0.5925 - val_accuracy: 0.6864 - val_loss: 0.9562 - learning_rate: 5.0000e-04
Epoch 17/20
327/328 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.8130 - loss: 0.5640
Epoch 17: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.8130 - loss: 0.5640 - val_accuracy: 0.6768 - val_loss: 1.1052 - learning_rate: 5.0000e-04
In [ ]:
# ------------------------------Small------------------------------
aug_small_densenet_model = mini_densenet()
aug_small_densenet_model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

aug_small_densenet_model.summary()

# ------------------------------Large------------------------------
aug_large_densenet_model = mini_densenet(input_shape=(101, 101, 1))
aug_large_densenet_model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

aug_large_densenet_model.summary()
Model: "functional_48"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)        ┃ Output Shape      ┃    Param # ┃ Connected to      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ input_layer_48      │ (None, 23, 23, 1) │          0 │ -                 │
│ (InputLayer)        │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_122 (Conv2D) │ (None, 23, 23,    │        160 │ input_layer_48[0… │
│                     │ 16)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 23,    │         64 │ conv2d_122[0][0]  │
│ (BatchNormalizatio… │ 16)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_12 (ReLU)     │ (None, 23, 23,    │          0 │ batch_normalizat… │
│                     │ 16)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_123 (Conv2D) │ (None, 23, 23,    │      1,740 │ re_lu_12[0][0]    │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_12      │ (None, 23, 23,    │          0 │ conv2d_122[0][0], │
│ (Concatenate)       │ 28)               │            │ conv2d_123[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 23,    │        112 │ concatenate_12[0… │
│ (BatchNormalizatio… │ 28)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_13 (ReLU)     │ (None, 23, 23,    │          0 │ batch_normalizat… │
│                     │ 28)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_124 (Conv2D) │ (None, 23, 23,    │      3,036 │ re_lu_13[0][0]    │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_13      │ (None, 23, 23,    │          0 │ concatenate_12[0… │
│ (Concatenate)       │ 40)               │            │ conv2d_124[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 23,    │        160 │ concatenate_13[0… │
│ (BatchNormalizatio… │ 40)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_14 (ReLU)     │ (None, 23, 23,    │          0 │ batch_normalizat… │
│                     │ 40)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_125 (Conv2D) │ (None, 23, 23,    │      4,332 │ re_lu_14[0][0]    │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_14      │ (None, 23, 23,    │          0 │ concatenate_13[0… │
│ (Concatenate)       │ 52)               │            │ conv2d_125[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ max_pooling2d_34    │ (None, 23, 11,    │          0 │ concatenate_14[0… │
│ (MaxPooling2D)      │ 52)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 11,    │        208 │ max_pooling2d_34… │
│ (BatchNormalizatio… │ 52)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_15 (ReLU)     │ (None, 23, 11,    │          0 │ batch_normalizat… │
│                     │ 52)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_126 (Conv2D) │ (None, 23, 11,    │      5,628 │ re_lu_15[0][0]    │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_15      │ (None, 23, 11,    │          0 │ max_pooling2d_34… │
│ (Concatenate)       │ 64)               │            │ conv2d_126[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 11,    │        256 │ concatenate_15[0… │
│ (BatchNormalizatio… │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_16 (ReLU)     │ (None, 23, 11,    │          0 │ batch_normalizat… │
│                     │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_127 (Conv2D) │ (None, 23, 11,    │      6,924 │ re_lu_16[0][0]    │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_16      │ (None, 23, 11,    │          0 │ concatenate_15[0… │
│ (Concatenate)       │ 76)               │            │ conv2d_127[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 23, 11,    │        304 │ concatenate_16[0… │
│ (BatchNormalizatio… │ 76)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_17 (ReLU)     │ (None, 23, 11,    │          0 │ batch_normalizat… │
│                     │ 76)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_128 (Conv2D) │ (None, 23, 11,    │      8,220 │ re_lu_17[0][0]    │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_17      │ (None, 23, 11,    │          0 │ concatenate_16[0… │
│ (Concatenate)       │ 88)               │            │ conv2d_128[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ global_average_poo… │ (None, 88)        │          0 │ concatenate_17[0… │
│ (GlobalAveragePool… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_71 (Dense)    │ (None, 64)        │      5,696 │ global_average_p… │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_48          │ (None, 64)        │          0 │ dense_71[0][0]    │
│ (Dropout)           │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_72 (Dense)    │ (None, 11)        │        715 │ dropout_48[0][0]  │
└─────────────────────┴───────────────────┴────────────┴───────────────────┘
 Total params: 37,555 (146.70 KB)
 Trainable params: 37,003 (144.54 KB)
 Non-trainable params: 552 (2.16 KB)
Model: "functional_49"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)        ┃ Output Shape      ┃    Param # ┃ Connected to      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ input_layer_49      │ (None, 101, 101,  │          0 │ -                 │
│ (InputLayer)        │ 1)                │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_129 (Conv2D) │ (None, 101, 101,  │        160 │ input_layer_49[0… │
│                     │ 16)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 101,  │         64 │ conv2d_129[0][0]  │
│ (BatchNormalizatio… │ 16)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_18 (ReLU)     │ (None, 101, 101,  │          0 │ batch_normalizat… │
│                     │ 16)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_130 (Conv2D) │ (None, 101, 101,  │      1,740 │ re_lu_18[0][0]    │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_18      │ (None, 101, 101,  │          0 │ conv2d_129[0][0], │
│ (Concatenate)       │ 28)               │            │ conv2d_130[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 101,  │        112 │ concatenate_18[0… │
│ (BatchNormalizatio… │ 28)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_19 (ReLU)     │ (None, 101, 101,  │          0 │ batch_normalizat… │
│                     │ 28)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_131 (Conv2D) │ (None, 101, 101,  │      3,036 │ re_lu_19[0][0]    │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_19      │ (None, 101, 101,  │          0 │ concatenate_18[0… │
│ (Concatenate)       │ 40)               │            │ conv2d_131[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 101,  │        160 │ concatenate_19[0… │
│ (BatchNormalizatio… │ 40)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_20 (ReLU)     │ (None, 101, 101,  │          0 │ batch_normalizat… │
│                     │ 40)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_132 (Conv2D) │ (None, 101, 101,  │      4,332 │ re_lu_20[0][0]    │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_20      │ (None, 101, 101,  │          0 │ concatenate_19[0… │
│ (Concatenate)       │ 52)               │            │ conv2d_132[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ max_pooling2d_35    │ (None, 101, 50,   │          0 │ concatenate_20[0… │
│ (MaxPooling2D)      │ 52)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 50,   │        208 │ max_pooling2d_35… │
│ (BatchNormalizatio… │ 52)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_21 (ReLU)     │ (None, 101, 50,   │          0 │ batch_normalizat… │
│                     │ 52)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_133 (Conv2D) │ (None, 101, 50,   │      5,628 │ re_lu_21[0][0]    │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_21      │ (None, 101, 50,   │          0 │ max_pooling2d_35… │
│ (Concatenate)       │ 64)               │            │ conv2d_133[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 50,   │        256 │ concatenate_21[0… │
│ (BatchNormalizatio… │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_22 (ReLU)     │ (None, 101, 50,   │          0 │ batch_normalizat… │
│                     │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_134 (Conv2D) │ (None, 101, 50,   │      6,924 │ re_lu_22[0][0]    │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_22      │ (None, 101, 50,   │          0 │ concatenate_21[0… │
│ (Concatenate)       │ 76)               │            │ conv2d_134[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ batch_normalizatio… │ (None, 101, 50,   │        304 │ concatenate_22[0… │
│ (BatchNormalizatio… │ 76)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ re_lu_23 (ReLU)     │ (None, 101, 50,   │          0 │ batch_normalizat… │
│                     │ 76)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_135 (Conv2D) │ (None, 101, 50,   │      8,220 │ re_lu_23[0][0]    │
│                     │ 12)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_23      │ (None, 101, 50,   │          0 │ concatenate_22[0… │
│ (Concatenate)       │ 88)               │            │ conv2d_135[0][0]  │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ global_average_poo… │ (None, 88)        │          0 │ concatenate_23[0… │
│ (GlobalAveragePool… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_73 (Dense)    │ (None, 64)        │      5,696 │ global_average_p… │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_49          │ (None, 64)        │          0 │ dense_73[0][0]    │
│ (Dropout)           │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_74 (Dense)    │ (None, 11)        │        715 │ dropout_49[0][0]  │
└─────────────────────┴───────────────────┴────────────┴───────────────────┘
 Total params: 37,555 (146.70 KB)
 Trainable params: 37,003 (144.54 KB)
 Non-trainable params: 552 (2.16 KB)
In [ ]:
# ------------------------------Small------------------------------
aug_small_densenet_history = aug_small_densenet_model.fit(
    small_train,
    validation_data=small_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

small_history_dict['DenseNet with Augmented Data'] = aug_small_densenet_history.history

# ------------------------------Large------------------------------
aug_large_densenet_history = aug_large_densenet_model.fit(
    large_train,
    validation_data=large_val,
    epochs=20,
    callbacks=[early_stop, reduce_lr]
)

large_history_dict['DenseNet with Augmented Data'] = aug_large_densenet_history.history
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 17s 26ms/step - accuracy: 0.1970 - loss: 2.2269 - val_accuracy: 0.0909 - val_loss: 4.2205 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 10s 5ms/step - accuracy: 0.3451 - loss: 1.8844 - val_accuracy: 0.1768 - val_loss: 3.6344 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.4385 - loss: 1.6101 - val_accuracy: 0.3036 - val_loss: 2.2559 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.4908 - loss: 1.4303 - val_accuracy: 0.4486 - val_loss: 2.0633 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.5318 - loss: 1.3078 - val_accuracy: 0.4741 - val_loss: 1.7668 - learning_rate: 0.0010
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.5789 - loss: 1.2072 - val_accuracy: 0.6218 - val_loss: 1.1075 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.5984 - loss: 1.1355 - val_accuracy: 0.3705 - val_loss: 2.4384 - learning_rate: 0.0010
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6354 - loss: 1.0629 - val_accuracy: 0.4941 - val_loss: 1.5925 - learning_rate: 0.0010
Epoch 9/20
320/328 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.6480 - loss: 1.0318
Epoch 9: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6482 - loss: 1.0314 - val_accuracy: 0.4523 - val_loss: 1.7354 - learning_rate: 0.0010
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.6861 - loss: 0.9208 - val_accuracy: 0.5936 - val_loss: 1.2444 - learning_rate: 5.0000e-04
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7056 - loss: 0.8801 - val_accuracy: 0.5818 - val_loss: 1.3433 - learning_rate: 5.0000e-04
Epoch 12/20
320/328 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.7092 - loss: 0.8502
Epoch 12: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7092 - loss: 0.8503 - val_accuracy: 0.5859 - val_loss: 1.3310 - learning_rate: 5.0000e-04
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7322 - loss: 0.8058 - val_accuracy: 0.7173 - val_loss: 0.8433 - learning_rate: 2.5000e-04
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7342 - loss: 0.7723 - val_accuracy: 0.7332 - val_loss: 0.7947 - learning_rate: 2.5000e-04
Epoch 15/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7469 - loss: 0.7575 - val_accuracy: 0.7214 - val_loss: 0.8824 - learning_rate: 2.5000e-04
Epoch 16/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7447 - loss: 0.7501 - val_accuracy: 0.7191 - val_loss: 0.8465 - learning_rate: 2.5000e-04
Epoch 17/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7482 - loss: 0.7494 - val_accuracy: 0.7423 - val_loss: 0.7785 - learning_rate: 2.5000e-04
Epoch 18/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7597 - loss: 0.7250 - val_accuracy: 0.7541 - val_loss: 0.7254 - learning_rate: 2.5000e-04
Epoch 19/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7552 - loss: 0.7328 - val_accuracy: 0.7495 - val_loss: 0.7868 - learning_rate: 2.5000e-04
Epoch 20/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7567 - loss: 0.7185 - val_accuracy: 0.7741 - val_loss: 0.6897 - learning_rate: 2.5000e-04
Epoch 1/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 31s 65ms/step - accuracy: 0.2395 - loss: 2.1515 - val_accuracy: 0.0909 - val_loss: 4.5025 - learning_rate: 0.0010
Epoch 2/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 16s 48ms/step - accuracy: 0.4017 - loss: 1.6754 - val_accuracy: 0.2427 - val_loss: 3.1961 - learning_rate: 0.0010
Epoch 3/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 46ms/step - accuracy: 0.4706 - loss: 1.4717 - val_accuracy: 0.3914 - val_loss: 1.7922 - learning_rate: 0.0010
Epoch 4/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.5174 - loss: 1.3745 - val_accuracy: 0.2936 - val_loss: 2.6981 - learning_rate: 0.0010
Epoch 5/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 47ms/step - accuracy: 0.5553 - loss: 1.2520 - val_accuracy: 0.3127 - val_loss: 2.9928 - learning_rate: 0.0010
Epoch 6/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 43ms/step - accuracy: 0.5891 - loss: 1.1725
Epoch 6: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 46ms/step - accuracy: 0.5891 - loss: 1.1725 - val_accuracy: 0.4595 - val_loss: 2.5955 - learning_rate: 0.0010
Epoch 7/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 16s 47ms/step - accuracy: 0.6392 - loss: 1.0507 - val_accuracy: 0.5768 - val_loss: 1.2177 - learning_rate: 5.0000e-04
Epoch 8/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 47ms/step - accuracy: 0.6510 - loss: 1.0037 - val_accuracy: 0.3355 - val_loss: 4.3405 - learning_rate: 5.0000e-04
Epoch 9/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.6674 - loss: 0.9675 - val_accuracy: 0.5509 - val_loss: 1.3227 - learning_rate: 5.0000e-04
Epoch 10/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 43ms/step - accuracy: 0.6828 - loss: 0.9172
Epoch 10: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
328/328 ━━━━━━━━━━━━━━━━━━━━ 21s 47ms/step - accuracy: 0.6828 - loss: 0.9172 - val_accuracy: 0.4532 - val_loss: 2.1191 - learning_rate: 5.0000e-04
Epoch 11/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 47ms/step - accuracy: 0.7049 - loss: 0.8696 - val_accuracy: 0.7400 - val_loss: 0.8087 - learning_rate: 2.5000e-04
Epoch 12/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 47ms/step - accuracy: 0.7361 - loss: 0.8097 - val_accuracy: 0.8055 - val_loss: 0.5951 - learning_rate: 2.5000e-04
Epoch 13/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 45ms/step - accuracy: 0.7441 - loss: 0.7709 - val_accuracy: 0.5795 - val_loss: 1.2081 - learning_rate: 2.5000e-04
Epoch 14/20
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 47ms/step - accuracy: 0.7505 - loss: 0.7427 - val_accuracy: 0.4923 - val_loss: 1.8497 - learning_rate: 2.5000e-04
Epoch 15/20
327/328 ━━━━━━━━━━━━━━━━━━━━ 0s 43ms/step - accuracy: 0.7628 - loss: 0.7160
Epoch 15: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.7628 - loss: 0.7161 - val_accuracy: 0.6718 - val_loss: 0.9451 - learning_rate: 2.5000e-04

Model Evaluation¶

In [ ]:
small_results_df = pd.DataFrame()

for name, history in small_history_dict.items():
    final_train_acc = history['accuracy'][-1]
    final_val_acc = history['val_accuracy'][-1]
    new_row = pd.DataFrame({
        'Model': [name],
        'Final Train Accuracy': [final_train_acc],
        'Final Val Accuracy': [final_val_acc],
    })
    small_results_df = pd.concat([small_results_df, new_row], ignore_index=True)

# Sort results by validation accuracy (optional)
small_results_df = small_results_df.sort_values(by='Final Val Accuracy', ascending=False).reset_index(drop=True)

# Function to highlight top 3 and bottom 3
def highlight_top_bottom(s):
    sorted_idx = s.sort_values(ascending=False).index
    styles = [''] * len(s)

    # Top 3: shades of green
    if len(s) >= 1: styles[sorted_idx[0]] = 'background-color: #a1d99b'  # top
    if len(s) >= 2: styles[sorted_idx[1]] = 'background-color: #c7e9c0'
    if len(s) >= 3: styles[sorted_idx[2]] = 'background-color: #e5f5e0'

    # Bottom 3: shades of red
    if len(s) >= 1: styles[sorted_idx[-1]] = 'background-color: #fc9272'  # worst
    if len(s) >= 2: styles[sorted_idx[-2]] = 'background-color: #fcbba1'
    if len(s) >= 3: styles[sorted_idx[-3]] = 'background-color: #fee0d2'

    return styles

# Apply styling
styled_df = small_results_df.style.apply(highlight_top_bottom, subset=['Final Train Accuracy'])
styled_df = styled_df.apply(highlight_top_bottom, subset=['Final Val Accuracy'])

display(styled_df)
  Model Final Train Accuracy Final Val Accuracy
0 Custom CNN 0.921860 0.907727
1 VGG 0.915190 0.844091
2 ResNet50 with Augmented Data 0.956737 0.830000
3 DenseNet with Augmented Data 0.755289 0.774091
4 ResNet50 0.935201 0.734545
5 DenseNet 0.741471 0.676818
6 MobileNet 0.735563 0.528636
7 Custom CNN with Augmented Data 0.786545 0.519545
8 VGG with Augmented Data 0.638841 0.513182
9 Dummy Baseline 0.377454 0.295455
10 Dummy Baseline with Augmented Data 0.251953 0.212273
11 MobileNet with Augmented Data 0.666857 0.145000

Top models for 23x23 Images¶

We can observe that the top 3 models (based on validation accuracy) are:

  • Custom CNN
  • VGG
  • Resnet50 with Augmented Data

the worst are:

  • Dummy Baseline with Augmented Data
  • Dummy Baseline
  • MobileNet with Augmented Data

We can observe that the Augmented models perfomed slightly worse than those without. This could be due to reasons such as overagressive transformations. Rotations, shears, zooms or random crops can obliterate the tiny features your network needs, especially at 23x23 where every pixel matters.

In [ ]:
large_results_df = pd.DataFrame()

for name, history in large_history_dict.items():
    final_train_acc = history['accuracy'][-1]
    final_val_acc = history['val_accuracy'][-1]
    new_row = pd.DataFrame({
        'Model': [name],
        'Final Train Accuracy': [final_train_acc],
        'Final Val Accuracy': [final_val_acc],
    })
    large_results_df = pd.concat([large_results_df, new_row], ignore_index=True)

# Sort results by validation accuracy (optional)
large_results_df = large_results_df.sort_values(by='Final Val Accuracy', ascending=False).reset_index(drop=True)

# Function to highlight top 3 and bottom 3
def highlight_top_bottom(s):
    sorted_idx = s.sort_values(ascending=False).index
    styles = [''] * len(s)

    # Top 3: shades of green
    if len(s) >= 1: styles[sorted_idx[0]] = 'background-color: #a1d99b'  # top
    if len(s) >= 2: styles[sorted_idx[1]] = 'background-color: #c7e9c0'
    if len(s) >= 3: styles[sorted_idx[2]] = 'background-color: #e5f5e0'

    # Bottom 3: shades of red
    if len(s) >= 1: styles[sorted_idx[-1]] = 'background-color: #fc9272'  # worst
    if len(s) >= 2: styles[sorted_idx[-2]] = 'background-color: #fcbba1'
    if len(s) >= 3: styles[sorted_idx[-3]] = 'background-color: #fee0d2'

    return styles

# Apply styling
styled_df = large_results_df.style.apply(highlight_top_bottom, subset=['Final Train Accuracy'])
styled_df = styled_df.apply(highlight_top_bottom, subset=['Final Val Accuracy'])

display(styled_df)
  Model Final Train Accuracy Final Val Accuracy
0 Custom CNN 0.983133 0.967273
1 ResNet50 0.947208 0.931818
2 VGG 0.870593 0.922727
3 ResNet50 with Augmented Data 0.966743 0.904545
4 Custom CNN with Augmented Data 0.935582 0.734091
5 VGG with Augmented Data 0.700496 0.693636
6 DenseNet 0.810082 0.676818
7 DenseNet with Augmented Data 0.758052 0.671818
8 MobileNet 0.726415 0.379545
9 Dummy Baseline 0.554888 0.326364
10 MobileNet with Augmented Data 0.765771 0.308182
11 Dummy Baseline with Augmented Data 0.333810 0.171364

Top models for 101x101 Images¶

We can observe that the top 3 models (based on validation accuracy) are:

  • Custom CNN
  • Resnet 50
  • VGG

the worst are:

  • Dummy Baseline
  • MobileNet with Augmented Data
  • Dummy Baseline with Augmented Data

On 101x101, our Augmented models performed slightly better. This could be due to 101x101 having more pixels, and as a result the transformations were not too limited by the possible augmentations.

Hyperparameter Tuning¶

What metric will we use to hypertune?¶

Earlier we discussed the various metrics and their best use cases.

Given our earlier balancing of the dataset, and the non-safety-critical 11-way fruit/vegetable task, plain accuracy is perfectly reasonable as your single "how often am I right" measure—class-frequencies won't skew it. Hence, we will primarily use accuracy as our objective metric in determining the best tuned model.

23x23 Dataset Hypertuning¶

Here we will be hypertuning our top model trained on the 23x23 images, which is our Custom CNN trained on non-augmented data.

23x23 CNN non-augmented tuning¶

filters_block1 = [32, 64] Why: The first convolutional block typically learns basic low-level features, such as edges, textures, and simple shapes.

32 filters help keep the model lightweight, which is crucial when working with smaller input images (e.g., 23x23 pixels) as it reduces computational overhead.

64 filters offer more capacity for feature extraction, which allows the network to learn richer representations, though it comes at the cost of increased memory usage and computation.

Balancing these values helps strike a good trade-off between model capacity (the ability to learn complex features) and the risk of overfitting, especially when dealing with small datasets.




filters_block2 = [64, 128] Why: The second convolutional block captures more complex, higher-level patterns, like shapes and textures (e.g., the outline of a leaf or surface patterns).

After pooling, the spatial resolution of the feature maps reduces, so having larger filter sizes (128) allows the network to capture more abstract and complex features from the reduced input space.

Increasing the filter count here helps compensate for the reduction in spatial size while enabling the network to extract richer, more detailed representations of the input data.




dense_units = [64, 128] Why: Determines the capacity of your fully connected classifier.

64 units are typically enough for simpler tasks or when the data is less complex, helping keep the model compact and efficient.

128 units provide more capacity, enabling the model to learn more detailed and sophisticated representations at the cost of additional computational resources.

Tuning this parameter allows the network to find a good balance between underfitting (too few units) and overfitting (too many units), helping the model generalize well to unseen data without becoming overly complex.




dropout_rate = [0.2, 0.3, 0.4, 0.5] Why: Dropout is a regularization technique that helps prevent overfitting by randomly setting a fraction of the neurons to zero during training.

For small input sizes (like 23x23 images), dropout is particularly important because it reduces the chance of the model memorizing noisy or irrelevant patterns from the data.

Exploring a range of dropout rates (from mild to aggressive) allows you to find the right level of regularization. If the dropout rate is too low, the model might overfit, but if it's too high, the model might underfit and struggle to learn useful patterns.




l2_reg = log scale from 1e-5 to 1e-2 Why: L2 regularization penalizes large weights by adding a penalty to the loss function, which helps prevent overfitting by forcing the model to learn simpler, smaller weight values.

Logarithmic scale is used here because small changes in regularization strength (especially in lower values) can have a significant impact on model performance. It gives you finer control over the regularization process.

Choosing the right L2 regularization strength is crucial for preventing the model from becoming overly complex and fitting noise, while still allowing it to capture meaningful patterns.




Activation functions (relu, leaky_relu) Why: The activation function introduces nonlinearity into the network, enabling it to learn complex patterns.

ReLU is the default choice because it's efficient and works well in many scenarios, but it can suffer from the “dying ReLU” problem, where neurons can become inactive during training (especially when the input is very small or negative).

Leaky ReLU mitigates this issue by allowing a small negative gradient when the input is less than zero, ensuring neurons continue to learn even if their activations are negative.

Tuning activation functions across convolutional and dense layers helps the model adapt to the nonlinearities in the data, especially for tasks with low-resolution images where learning fine-grained features is more challenging.




optimizer = ['adam', 'adamax', 'nadam', 'sgd'] Why: The optimizer controls how the model's weights are updated during training. Different optimizers behave differently, especially when dealing with small images.

Adam is a widely used baseline optimizer that combines adaptive learning rates with momentum, making it a reliable choice for many tasks.

Adamax is a variant of Adam that can sometimes perform better with sparse gradients, which may occur when working with images that have many flat regions (e.g., backgrounds).

Nadam combines Adam with Nesterov momentum, which can speed up convergence and sometimes lead to better performance.

SGD (Stochastic Gradient Descent) is a classic optimizer that often requires more tuning (learning rate, momentum), but can provide better generalization when used correctly.

Exploring these optimizers helps find the one that best suits the learning dynamics of small image tasks.




learning_rate = log scale from 1e-5 to 1e-2 Why: The learning rate controls the size of the steps the optimizer takes when updating model parameters.

A log scale is used because the learning rate has a large effect on training dynamics. Very small learning rates might lead to slow convergence, while too high a learning rate could cause the model to miss the optimal solution.

Capturing both slow and fast learning behaviors with this log scale helps find the optimal balance between stability (avoiding overshooting the minimum) and speed (converging efficiently).

In [ ]:
def build_small_custom_cnn(hp):
    filters_block1 = hp.Choice('filters_block1', values=[32, 64])
    filters_block2 = hp.Choice('filters_block2', values=[64, 128])
    dense_units = hp.Choice('dense_units', values=[64, 128])
    dropout_rate = hp.Float('dropout_rate', min_value=0.2, max_value=0.5, step=0.1)
    l2_reg = hp.Float('l2_reg', min_value=1e-5, max_value=1e-2, sampling='log')

    # Tunable activation functions
    act_block1 = hp.Choice('activation_block1', values=['relu', 'leaky_relu'])
    act_block2 = hp.Choice('activation_block2', values=['relu', 'leaky_relu'])
    act_block3 = hp.Choice('activation_block3', values=['relu', 'leaky_relu'])
    act_dense = hp.Choice('activation_dense', values=['relu', 'leaky_relu'])

    # Tunable optimizer and learning rate
    optimizer_choice = hp.Choice('optimizer', values=['adam', 'adamax', 'nadam', 'sgd'])
    learning_rate = hp.Float('learning_rate', min_value=1e-5, max_value=1e-2, sampling='log')

    # Build optimizer with learning rate
    if optimizer_choice == 'adam':
        optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
    elif optimizer_choice == 'adamax':
        optimizer = tf.keras.optimizers.Adamax(learning_rate=learning_rate)
    elif optimizer_choice == 'nadam':
        optimizer = tf.keras.optimizers.Nadam(learning_rate=learning_rate)
    elif optimizer_choice == 'sgd':
        optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate)

    reg = tf.keras.regularizers.l2(l2_reg)

    model = tf.keras.Sequential([
        tf.keras.layers.Input(shape=(23, 23, 1)),

        # Block 1
        tf.keras.layers.Conv2D(filters_block1, (3, 3), activation=act_block1, padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Conv2D(filters_block1, (3, 3), activation=act_block1, padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dropout(dropout_rate),

        # Block 2
        tf.keras.layers.Conv2D(filters_block2, (3, 3), activation=act_block2, padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Conv2D(filters_block2, (3, 3), activation=act_block2, padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Dropout(dropout_rate),

        # Block 3
        tf.keras.layers.Conv2D(128, (3, 3), activation=act_block3, padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Conv2D(128, (3, 3), activation=act_block3, padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dropout(dropout_rate),

        # Classifier
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dense(dense_units, activation=act_dense, kernel_regularizer=reg),
        tf.keras.layers.Dropout(dropout_rate),
        tf.keras.layers.Dense(11, activation='softmax')
    ])

    model.compile(
        optimizer=optimizer,
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )

    return model
In [ ]:
small_cnn_tuner = RandomSearch(
    build_small_custom_cnn,
    objective='val_accuracy',
    max_trials=10,
    directory='cnn_tuner',
    project_name='small_non_augmented_cnn_5'
)


small_cnn_tuner.search(small_train,
                 validation_data=small_val,
                 epochs=20,
                 callbacks=[early_stop, reduce_lr])
Trial 10 Complete [00h 01m 21s]
val_accuracy: 0.34909090399742126

Best val_accuracy So Far: 0.9527272582054138
Total elapsed time: 00h 15m 12s

Note: Hyperparameter tuning was conducted iteratively to optimize model performance. As a result, the hyperparameters presented here may differ from those used in subsequent stages, reflecting configurations that yielded better performance during earlier evaluations.

In [ ]:
small_tuned_cnn = small_cnn_tuner.get_best_models(num_models=1)[0]
tuned_cnn_hyperparams = small_cnn_tuner.get_best_hyperparameters(1)[0]

print("Best hyperparameters:")
print(tuned_cnn_hyperparams.values)
Best hyperparameters:
{'filters_block1': 64, 'filters_block2': 128, 'dense_units': 128, 'dropout_rate': 0.4, 'l2_reg': 1.3179456160919495e-05, 'activation_block1': 'relu', 'activation_block2': 'relu', 'activation_block3': 'relu', 'activation_dense': 'relu', 'optimizer': 'adam', 'learning_rate': 0.0014127161702417554}
/usr/local/lib/python3.11/dist-packages/keras/src/saving/saving_lib.py:757: UserWarning: Skipping variable loading for optimizer 'adam', because it has 2 variables whereas the saved optimizer has 58 variables. 
  saveable.load_own_variables(weights_store.get(inner_path))

101x101 Dataset Hypertuning¶

Here we will be hypertuning our top model that are trained on our 101x101 images, which is our Custom CNN trained on non-augmented data.

101x101 CNN non-augmented tuning¶

filters_block1 = [32, 64] Why: The first convolutional block typically learns basic low-level features, such as edges, textures, and simple shapes.

32 filters help keep the model lightweight, which is crucial when working with smaller input images (e.g., 23x23 pixels) as it reduces computational overhead.

64 filters offer more capacity for feature extraction, which allows the network to learn richer representations, though it comes at the cost of increased memory usage and computation.

Balancing these values helps strike a good trade-off between model capacity (the ability to learn complex features) and the risk of overfitting, especially when dealing with small datasets.




filters_block2 = [64, 128] Why: The second convolutional block captures more complex, higher-level patterns, like shapes and textures (e.g., the outline of a leaf or surface patterns).

After pooling, the spatial resolution of the feature maps reduces, so having larger filter sizes (128) allows the network to capture more abstract and complex features from the reduced input space.

Increasing the filter count here helps compensate for the reduction in spatial size while enabling the network to extract richer, more detailed representations of the input data.




dense_units = [64, 128] Why: Determines the capacity of your fully connected classifier.

64 units are typically enough for simpler tasks or when the data is less complex, helping keep the model compact and efficient.

128 units provide more capacity, enabling the model to learn more detailed and sophisticated representations at the cost of additional computational resources.

Tuning this parameter allows the network to find a good balance between underfitting (too few units) and overfitting (too many units), helping the model generalize well to unseen data without becoming overly complex.




dropout_rate = [0.2, 0.3, 0.4, 0.5] Why: Dropout is a regularization technique that helps prevent overfitting by randomly setting a fraction of the neurons to zero during training.

For small input sizes (like 23x23 images), dropout is particularly important because it reduces the chance of the model memorizing noisy or irrelevant patterns from the data.

Exploring a range of dropout rates (from mild to aggressive) allows you to find the right level of regularization. If the dropout rate is too low, the model might overfit, but if it's too high, the model might underfit and struggle to learn useful patterns.




l2_reg = log scale from 1e-5 to 1e-2 Why: L2 regularization penalizes large weights by adding a penalty to the loss function, which helps prevent overfitting by forcing the model to learn simpler, smaller weight values.

Logarithmic scale is used here because small changes in regularization strength (especially in lower values) can have a significant impact on model performance. It gives you finer control over the regularization process.

Choosing the right L2 regularization strength is crucial for preventing the model from becoming overly complex and fitting noise, while still allowing it to capture meaningful patterns.




Activation functions (relu, leaky_relu) Why: The activation function introduces nonlinearity into the network, enabling it to learn complex patterns.

ReLU is the default choice because it's efficient and works well in many scenarios, but it can suffer from the “dying ReLU” problem, where neurons can become inactive during training (especially when the input is very small or negative).

Leaky ReLU mitigates this issue by allowing a small negative gradient when the input is less than zero, ensuring neurons continue to learn even if their activations are negative.

Tuning activation functions across convolutional and dense layers helps the model adapt to the nonlinearities in the data, especially for tasks with low-resolution images where learning fine-grained features is more challenging.




optimizer = ['adam', 'adamax', 'nadam', 'sgd'] Why: The optimizer controls how the model's weights are updated during training. Different optimizers behave differently, especially when dealing with small images.

Adam is a widely used baseline optimizer that combines adaptive learning rates with momentum, making it a reliable choice for many tasks.

Adamax is a variant of Adam that can sometimes perform better with sparse gradients, which may occur when working with images that have many flat regions (e.g., backgrounds).

Nadam combines Adam with Nesterov momentum, which can speed up convergence and sometimes lead to better performance.

SGD (Stochastic Gradient Descent) is a classic optimizer that often requires more tuning (learning rate, momentum), but can provide better generalization when used correctly.

Exploring these optimizers helps find the one that best suits the learning dynamics of small image tasks.




learning_rate = log scale from 1e-5 to 1e-2 Why: The learning rate controls the size of the steps the optimizer takes when updating model parameters.

A log scale is used because the learning rate has a large effect on training dynamics. Very small learning rates might lead to slow convergence, while too high a learning rate could cause the model to miss the optimal solution.

Capturing both slow and fast learning behaviors with this log scale helps find the optimal balance between stability (avoiding overshooting the minimum) and speed (converging efficiently).

In [ ]:
def build_large_custom_cnn(hp):
    filters_block1 = hp.Choice('filters_block1', values=[32, 64])
    filters_block2 = hp.Choice('filters_block2', values=[64, 128])
    dense_units = hp.Choice('dense_units', values=[64, 128])
    dropout_rate = hp.Float('dropout_rate', min_value=0.2, max_value=0.5, step=0.1)
    l2_reg = hp.Float('l2_reg', min_value=1e-5, max_value=1e-2, sampling='log')

    # Tunable activation functions
    act_block1 = hp.Choice('activation_block1', values=['relu', 'leaky_relu'])
    act_block2 = hp.Choice('activation_block2', values=['relu', 'leaky_relu'])
    act_block3 = hp.Choice('activation_block3', values=['relu', 'leaky_relu'])
    act_dense = hp.Choice('activation_dense', values=['relu', 'leaky_relu'])

    # Tunable optimizer and learning rate
    optimizer_choice = hp.Choice('optimizer', values=['adam', 'adamax', 'nadam', 'sgd'])
    learning_rate = hp.Float('learning_rate', min_value=1e-5, max_value=1e-2, sampling='log')

    # Build optimizer with learning rate
    if optimizer_choice == 'adam':
        optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
    elif optimizer_choice == 'adamax':
        optimizer = tf.keras.optimizers.Adamax(learning_rate=learning_rate)
    elif optimizer_choice == 'nadam':
        optimizer = tf.keras.optimizers.Nadam(learning_rate=learning_rate)
    elif optimizer_choice == 'sgd':
        optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate)

    reg = tf.keras.regularizers.l2(l2_reg)

    model = tf.keras.Sequential([
        tf.keras.layers.Input(shape=(101, 101, 1)),

        # Block 1
        tf.keras.layers.Conv2D(filters_block1, (3, 3), activation=act_block1, padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Conv2D(filters_block1, (3, 3), activation=act_block1, padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dropout(dropout_rate),

        # Block 2
        tf.keras.layers.Conv2D(filters_block2, (3, 3), activation=act_block2, padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Conv2D(filters_block2, (3, 3), activation=act_block2, padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Dropout(dropout_rate),

        # Block 3
        tf.keras.layers.Conv2D(128, (3, 3), activation=act_block3, padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Conv2D(128, (3, 3), activation=act_block3, padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dropout(dropout_rate),

        # Classifier
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dense(dense_units, activation=act_dense, kernel_regularizer=reg),
        tf.keras.layers.Dropout(dropout_rate),
        tf.keras.layers.Dense(11, activation='softmax')
    ])

    model.compile(
        optimizer=optimizer,
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )

    return model
In [ ]:
large_cnn_tuner = RandomSearch(
    build_large_custom_cnn,
    objective='val_accuracy',
    max_trials=5,
    directory='cnn_tuner',
    project_name='large_non_augmented_cnn_2'
)


large_cnn_tuner.search(large_train,
                 validation_data=large_val,
                 epochs=20,
                 callbacks=[early_stop, reduce_lr])
Trial 5 Complete [00h 10m 59s]
val_accuracy: 0.4854545593261719

Best val_accuracy So Far: 0.9654545187950134
Total elapsed time: 01h 03m 42s

Note: Hyperparameter tuning was conducted iteratively to optimize model performance. As a result, the hyperparameters presented here may differ from those used in subsequent stages, reflecting configurations that yielded better performance during earlier evaluations.

In [ ]:
large_tuned_cnn = large_cnn_tuner.get_best_models(num_models=1)[0]
tuned_cnn_hyperparams = large_cnn_tuner.get_best_hyperparameters(1)[0]

print("Best hyperparameters:")
print(tuned_cnn_hyperparams.values)
Best hyperparameters:
{'filters_block1': 32, 'filters_block2': 128, 'dense_units': 128, 'dropout_rate': 0.30000000000000004, 'l2_reg': 1.3442250945870623e-05, 'activation_block1': 'relu', 'activation_block2': 'relu', 'activation_block3': 'leaky_relu', 'activation_dense': 'leaky_relu', 'optimizer': 'nadam', 'learning_rate': 0.00040405637109264074}
/usr/local/lib/python3.11/dist-packages/keras/src/saving/saving_lib.py:757: UserWarning: Skipping variable loading for optimizer 'nadam', because it has 2 variables whereas the saved optimizer has 59 variables. 
  saveable.load_own_variables(weights_store.get(inner_path))

Best Model¶

Our early stopping callback for training our best models. We customized the start_from_epoch parameter, in order to allow our model to have a minimum amount of epochs to train for, and not stopping too early. This is because not training for a minimum of 20 epochs will make our model perform worse on the test set.

In [ ]:
custom_early_stop = tf.keras.callbacks.EarlyStopping(
    patience=5,
    min_delta=0.0001,
    restore_best_weights=True,
    monitor='val_loss',
    start_from_epoch=20
)

Here we define a function to plot the learning curves from the model's history.

In [ ]:
def plot_history(history, title="CNN Model", metric_name='accuracy'):
    # Ensure the history contains the correct keys for accuracy and loss
    acc = history.history.get(metric_name, [])
    val_acc = history.history.get(f'val_{metric_name}', [])
    loss = history.history.get('loss', [])
    val_loss = history.history.get('val_loss', [])

    # Generate a range for the number of epochs
    epochs_range = range(1, len(acc) + 1)

    plt.figure(figsize=(14, 5))

    # ----- Accuracy Plot ----- #
    plt.subplot(1, 2, 1)
    plt.plot(epochs_range, acc, label='Training Accuracy')
    plt.plot(epochs_range, val_acc, label='Validation Accuracy')
    plt.title(f'{title} - Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.grid(True)

    # ----- Loss Plot ----- #
    plt.subplot(1, 2, 2)
    plt.plot(epochs_range, loss, label='Training Loss')
    plt.plot(epochs_range, val_loss, label='Validation Loss')
    plt.title(f'{title} - Loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid(True)

    plt.tight_layout()
    plt.show()

23x23 Images¶

Best CNN¶

In [ ]:
def build_small_custom_cnn_best():
    filters_block1 = 64
    filters_block2 = 64
    dense_units = 64
    dropout_rate = 0.2
    l2_reg = 3.7224610062669776e-05

    act_block1 = 'relu'
    act_block2 = 'leaky_relu'
    act_block3 = 'relu'
    act_dense = 'relu'

    learning_rate = 0.0017584971517999608
    optimizer = tf.keras.optimizers.Adamax(learning_rate=learning_rate)

    reg = tf.keras.regularizers.l2(l2_reg)

    def get_activation(act_name):
        if act_name == 'leaky_relu':
            return tf.keras.layers.LeakyReLU()
        else:
            return tf.keras.layers.Activation(act_name)

    model = tf.keras.Sequential([
        tf.keras.layers.Input(shape=(23, 23, 1)),

        # Block 1
        tf.keras.layers.Conv2D(filters_block1, (3, 3), padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        get_activation(act_block1),
        tf.keras.layers.Conv2D(filters_block1, (3, 3), padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        get_activation(act_block1),
        tf.keras.layers.Dropout(dropout_rate),

        # Block 2
        tf.keras.layers.Conv2D(filters_block2, (3, 3), padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        get_activation(act_block2),
        tf.keras.layers.Conv2D(filters_block2, (3, 3), padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        get_activation(act_block2),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Dropout(dropout_rate),

        # Block 3
        tf.keras.layers.Conv2D(128, (3, 3), padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        get_activation(act_block3),
        tf.keras.layers.Conv2D(128, (3, 3), padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        get_activation(act_block3),
        tf.keras.layers.Dropout(dropout_rate),

        # Classifier
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dense(dense_units, kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        get_activation(act_dense),
        tf.keras.layers.Dropout(dropout_rate),

        tf.keras.layers.Dense(11, activation='softmax')
    ])

    model.compile(
        optimizer=optimizer,
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )

    return model
In [ ]:
best_small_cnn = build_small_custom_cnn_best()
best_small_cnn.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ conv2d (Conv2D)                 │ (None, 23, 23, 64)     │           640 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization             │ (None, 23, 23, 64)     │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ activation (Activation)         │ (None, 23, 23, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_1 (Conv2D)               │ (None, 23, 23, 64)     │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_1           │ (None, 23, 23, 64)     │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ activation_1 (Activation)       │ (None, 23, 23, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout (Dropout)               │ (None, 23, 23, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_2 (Conv2D)               │ (None, 23, 23, 64)     │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_2           │ (None, 23, 23, 64)     │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ leaky_re_lu (LeakyReLU)         │ (None, 23, 23, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_3 (Conv2D)               │ (None, 23, 23, 64)     │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_3           │ (None, 23, 23, 64)     │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ leaky_re_lu_1 (LeakyReLU)       │ (None, 23, 23, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d (MaxPooling2D)    │ (None, 11, 11, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_1 (Dropout)             │ (None, 11, 11, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_4 (Conv2D)               │ (None, 11, 11, 128)    │        73,856 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_4           │ (None, 11, 11, 128)    │           512 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ activation_2 (Activation)       │ (None, 11, 11, 128)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_5 (Conv2D)               │ (None, 11, 11, 128)    │       147,584 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_5           │ (None, 11, 11, 128)    │           512 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ activation_3 (Activation)       │ (None, 11, 11, 128)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_2 (Dropout)             │ (None, 11, 11, 128)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling2d        │ (None, 128)            │             0 │
│ (GlobalAveragePooling2D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 64)             │         8,256 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_6           │ (None, 64)             │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ activation_4 (Activation)       │ (None, 64)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_3 (Dropout)             │ (None, 64)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 11)             │           715 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 344,139 (1.31 MB)
 Trainable params: 342,987 (1.31 MB)
 Non-trainable params: 1,152 (4.50 KB)
In [ ]:
best_small_cnn_checkpoint = tf.keras.callbacks.ModelCheckpoint(
    'best_small_cnn2.weights.h5', monitor='val_accuracy', save_best_only=True, save_weights_only=True, mode='max'
)


# Train
best_small_cnn_history = best_small_cnn.fit(
    small_train,
    validation_data=small_val,
    epochs=30,
    batch_size=32,
    verbose=2,
    callbacks=[custom_early_stop, reduce_lr, best_small_cnn_checkpoint]
)
Epoch 1/30
328/328 - 30s - 92ms/step - accuracy: 0.4362 - loss: 1.6755 - val_accuracy: 0.0914 - val_loss: 4.4623 - learning_rate: 0.0018
Epoch 2/30
328/328 - 3s - 8ms/step - accuracy: 0.6357 - loss: 1.1364 - val_accuracy: 0.3973 - val_loss: 2.1620 - learning_rate: 0.0018
Epoch 3/30
328/328 - 3s - 9ms/step - accuracy: 0.7156 - loss: 0.9051 - val_accuracy: 0.5168 - val_loss: 1.5474 - learning_rate: 0.0018
Epoch 4/30
328/328 - 3s - 9ms/step - accuracy: 0.7651 - loss: 0.7677 - val_accuracy: 0.6055 - val_loss: 1.1896 - learning_rate: 0.0018
Epoch 5/30
328/328 - 3s - 8ms/step - accuracy: 0.7956 - loss: 0.6607 - val_accuracy: 0.6886 - val_loss: 0.9974 - learning_rate: 0.0018
Epoch 6/30
328/328 - 3s - 8ms/step - accuracy: 0.8250 - loss: 0.5694 - val_accuracy: 0.6341 - val_loss: 1.4299 - learning_rate: 0.0018
Epoch 7/30
328/328 - 3s - 8ms/step - accuracy: 0.8445 - loss: 0.5181 - val_accuracy: 0.8432 - val_loss: 0.4943 - learning_rate: 0.0018
Epoch 8/30
328/328 - 5s - 16ms/step - accuracy: 0.8614 - loss: 0.4668 - val_accuracy: 0.7895 - val_loss: 0.6491 - learning_rate: 0.0018
Epoch 9/30
328/328 - 5s - 16ms/step - accuracy: 0.8747 - loss: 0.4154 - val_accuracy: 0.7145 - val_loss: 0.8420 - learning_rate: 0.0018
Epoch 10/30

Epoch 10: ReduceLROnPlateau reducing learning rate to 0.0008792486041784286.
328/328 - 3s - 8ms/step - accuracy: 0.8816 - loss: 0.3907 - val_accuracy: 0.6718 - val_loss: 1.0666 - learning_rate: 0.0018
Epoch 11/30
328/328 - 5s - 16ms/step - accuracy: 0.9217 - loss: 0.2811 - val_accuracy: 0.8873 - val_loss: 0.3458 - learning_rate: 8.7925e-04
Epoch 12/30
328/328 - 5s - 16ms/step - accuracy: 0.9295 - loss: 0.2547 - val_accuracy: 0.8709 - val_loss: 0.4351 - learning_rate: 8.7925e-04
Epoch 13/30
328/328 - 5s - 16ms/step - accuracy: 0.9393 - loss: 0.2314 - val_accuracy: 0.8423 - val_loss: 0.5182 - learning_rate: 8.7925e-04
Epoch 14/30

Epoch 14: ReduceLROnPlateau reducing learning rate to 0.0004396243020892143.
328/328 - 3s - 8ms/step - accuracy: 0.9444 - loss: 0.2153 - val_accuracy: 0.8305 - val_loss: 0.5314 - learning_rate: 8.7925e-04
Epoch 15/30
328/328 - 3s - 8ms/step - accuracy: 0.9568 - loss: 0.1772 - val_accuracy: 0.9200 - val_loss: 0.2902 - learning_rate: 4.3962e-04
Epoch 16/30
328/328 - 3s - 8ms/step - accuracy: 0.9625 - loss: 0.1621 - val_accuracy: 0.9027 - val_loss: 0.3391 - learning_rate: 4.3962e-04
Epoch 17/30
328/328 - 3s - 9ms/step - accuracy: 0.9639 - loss: 0.1535 - val_accuracy: 0.9168 - val_loss: 0.3148 - learning_rate: 4.3962e-04
Epoch 18/30
328/328 - 5s - 15ms/step - accuracy: 0.9669 - loss: 0.1471 - val_accuracy: 0.9227 - val_loss: 0.2854 - learning_rate: 4.3962e-04
Epoch 19/30
328/328 - 3s - 8ms/step - accuracy: 0.9690 - loss: 0.1386 - val_accuracy: 0.9364 - val_loss: 0.2359 - learning_rate: 4.3962e-04
Epoch 20/30
328/328 - 3s - 8ms/step - accuracy: 0.9701 - loss: 0.1376 - val_accuracy: 0.9355 - val_loss: 0.2347 - learning_rate: 4.3962e-04
Epoch 21/30
328/328 - 3s - 9ms/step - accuracy: 0.9700 - loss: 0.1345 - val_accuracy: 0.9282 - val_loss: 0.2506 - learning_rate: 4.3962e-04
Epoch 22/30
328/328 - 3s - 8ms/step - accuracy: 0.9736 - loss: 0.1221 - val_accuracy: 0.9350 - val_loss: 0.2506 - learning_rate: 4.3962e-04
Epoch 23/30
328/328 - 3s - 9ms/step - accuracy: 0.9750 - loss: 0.1220 - val_accuracy: 0.9395 - val_loss: 0.2317 - learning_rate: 4.3962e-04
Epoch 24/30
328/328 - 3s - 8ms/step - accuracy: 0.9762 - loss: 0.1195 - val_accuracy: 0.9532 - val_loss: 0.1915 - learning_rate: 4.3962e-04
Epoch 25/30
328/328 - 3s - 8ms/step - accuracy: 0.9772 - loss: 0.1142 - val_accuracy: 0.9386 - val_loss: 0.2225 - learning_rate: 4.3962e-04
Epoch 26/30
328/328 - 5s - 15ms/step - accuracy: 0.9799 - loss: 0.1062 - val_accuracy: 0.9059 - val_loss: 0.3116 - learning_rate: 4.3962e-04
Epoch 27/30

Epoch 27: ReduceLROnPlateau reducing learning rate to 0.00021981215104460716.
328/328 - 3s - 8ms/step - accuracy: 0.9767 - loss: 0.1091 - val_accuracy: 0.9491 - val_loss: 0.2079 - learning_rate: 4.3962e-04
Epoch 28/30
328/328 - 3s - 8ms/step - accuracy: 0.9819 - loss: 0.0967 - val_accuracy: 0.9468 - val_loss: 0.2196 - learning_rate: 2.1981e-04
Epoch 29/30
328/328 - 5s - 16ms/step - accuracy: 0.9848 - loss: 0.0922 - val_accuracy: 0.9582 - val_loss: 0.1832 - learning_rate: 2.1981e-04
Epoch 30/30
328/328 - 3s - 8ms/step - accuracy: 0.9873 - loss: 0.0872 - val_accuracy: 0.9541 - val_loss: 0.2007 - learning_rate: 2.1981e-04

Learning Curve of model¶

In [ ]:
plot_history(best_small_cnn_history)
No description has been provided for this image

Insights from Learning Curve:¶

  • Good Generalization:
    By epoch 15, training and validation accuracy both plateau around 95%. Validation loss settles low and roughly tracks training loss, so we're not massively over- or under-fitting.

  • Early-epoch noise:
    Noticeable spikes in validation loss around epochs 7 and 10 (and the corresponding dips in val accuracy).

  • Smooth convergence later:
    After ~15 epochs, both curves settle into a smooth decline of loss and steady rise of accuracy. This suggests our network capacity is sufficient for the task, and it eventually "absorbs" the augmentation noise.

101x101 Images¶

Best CNN¶

In [ ]:
def build_best_large_custom_cnn():
    # Best hyperparameters
    filters_block1 = 64
    filters_block2 = 64
    dense_units = 128
    dropout_rate = 0.2
    l2_reg = 0.0002833715647014918

    act_block1 = 'relu'
    act_block2 = 'relu'
    act_block3 = 'relu'
    act_dense = 'relu'

    optimizer_choice = 'adam'
    learning_rate = 0.0010643000310335154

    def get_activation(act_name):
        if act_name == 'relu':
            return tf.keras.layers.ReLU()
        elif act_name == 'leaky_relu':
            return tf.keras.layers.LeakyReLU()
        elif act_name == 'elu':
            return tf.keras.layers.ELU()
        else:
            raise ValueError(f"Unsupported activation: {act_name}")

    # Optimizer
    if optimizer_choice == 'adam':
        optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
    elif optimizer_choice == 'adamax':
        optimizer = tf.keras.optimizers.Adamax(learning_rate=learning_rate)
    elif optimizer_choice == 'nadam':
        optimizer = tf.keras.optimizers.Nadam(learning_rate=learning_rate)
    elif optimizer_choice == 'sgd':
        optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate)
    else:
        raise ValueError(f"Unsupported optimizer: {optimizer_choice}")

    reg = tf.keras.regularizers.l2(l2_reg)

    model = tf.keras.Sequential([
        tf.keras.layers.Input(shape=(101, 101, 1)),

        # Block 1
        tf.keras.layers.Conv2D(filters_block1, (3, 3), padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        get_activation(act_block1),
        tf.keras.layers.Conv2D(filters_block1, (3, 3), padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        get_activation(act_block1),
        tf.keras.layers.Dropout(dropout_rate),

        # Block 2
        tf.keras.layers.Conv2D(filters_block2, (3, 3), padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        get_activation(act_block2),
        tf.keras.layers.Conv2D(filters_block2, (3, 3), padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        get_activation(act_block2),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Dropout(dropout_rate),

        # Block 3
        tf.keras.layers.Conv2D(128, (3, 3), padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        get_activation(act_block3),
        tf.keras.layers.Conv2D(128, (3, 3), padding='same', kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        get_activation(act_block3),
        tf.keras.layers.Dropout(dropout_rate),

        # Classifier
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dense(dense_units, kernel_regularizer=reg),
        tf.keras.layers.BatchNormalization(),
        get_activation(act_dense),
        tf.keras.layers.Dropout(dropout_rate),
        tf.keras.layers.Dense(11, activation='softmax')
    ])

    model.compile(
        optimizer=optimizer,
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )

    return model
In [ ]:
best_large_cnn = build_best_large_custom_cnn()
best_large_cnn.summary()
Model: "sequential_2"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ conv2d_6 (Conv2D)               │ (None, 101, 101, 64)   │           640 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_7           │ (None, 101, 101, 64)   │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ re_lu_7 (ReLU)                  │ (None, 101, 101, 64)   │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_7 (Conv2D)               │ (None, 101, 101, 64)   │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_8           │ (None, 101, 101, 64)   │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ re_lu_8 (ReLU)                  │ (None, 101, 101, 64)   │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_4 (Dropout)             │ (None, 101, 101, 64)   │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_8 (Conv2D)               │ (None, 101, 101, 64)   │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_9           │ (None, 101, 101, 64)   │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ re_lu_9 (ReLU)                  │ (None, 101, 101, 64)   │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_9 (Conv2D)               │ (None, 101, 101, 64)   │        36,928 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_10          │ (None, 101, 101, 64)   │           256 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ re_lu_10 (ReLU)                 │ (None, 101, 101, 64)   │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_1 (MaxPooling2D)  │ (None, 50, 50, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_5 (Dropout)             │ (None, 50, 50, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_10 (Conv2D)              │ (None, 50, 50, 128)    │        73,856 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_11          │ (None, 50, 50, 128)    │           512 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ re_lu_11 (ReLU)                 │ (None, 50, 50, 128)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_11 (Conv2D)              │ (None, 50, 50, 128)    │       147,584 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_12          │ (None, 50, 50, 128)    │           512 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ re_lu_12 (ReLU)                 │ (None, 50, 50, 128)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_6 (Dropout)             │ (None, 50, 50, 128)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling2d_1      │ (None, 128)            │             0 │
│ (GlobalAveragePooling2D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 128)            │        16,512 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_13          │ (None, 128)            │           512 │
│ (BatchNormalization)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ re_lu_13 (ReLU)                 │ (None, 128)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_7 (Dropout)             │ (None, 128)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_3 (Dense)                 │ (None, 11)             │         1,419 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 353,355 (1.35 MB)
 Trainable params: 352,075 (1.34 MB)
 Non-trainable params: 1,280 (5.00 KB)
In [ ]:
best_large_cnn_checkpoint = tf.keras.callbacks.ModelCheckpoint(
    'best_large_cnn_3.weights.h5', monitor='val_accuracy', save_best_only=True, save_weights_only=True, mode='max'
)


best_large_cnn_history = best_large_cnn.fit(
    large_train,
    validation_data=large_val,
    epochs=30,
    batch_size=32,
    verbose=2,
    callbacks=[custom_early_stop, reduce_lr, best_large_cnn_checkpoint]
)
Epoch 1/30
328/328 - 53s - 162ms/step - accuracy: 0.4657 - loss: 1.7233 - val_accuracy: 0.0859 - val_loss: 6.1868 - learning_rate: 0.0011
Epoch 2/30
328/328 - 33s - 102ms/step - accuracy: 0.6957 - loss: 1.0842 - val_accuracy: 0.4155 - val_loss: 2.7520 - learning_rate: 0.0011
Epoch 3/30
328/328 - 34s - 102ms/step - accuracy: 0.7848 - loss: 0.8309 - val_accuracy: 0.2827 - val_loss: 5.8604 - learning_rate: 0.0011
Epoch 4/30
328/328 - 34s - 102ms/step - accuracy: 0.8457 - loss: 0.6619 - val_accuracy: 0.4227 - val_loss: 3.2960 - learning_rate: 0.0011
Epoch 5/30
328/328 - 34s - 102ms/step - accuracy: 0.8756 - loss: 0.5554 - val_accuracy: 0.6777 - val_loss: 1.4370 - learning_rate: 0.0011
Epoch 6/30
328/328 - 34s - 102ms/step - accuracy: 0.9024 - loss: 0.4844 - val_accuracy: 0.4268 - val_loss: 2.9050 - learning_rate: 0.0011
Epoch 7/30
328/328 - 33s - 102ms/step - accuracy: 0.9137 - loss: 0.4481 - val_accuracy: 0.3232 - val_loss: 5.0418 - learning_rate: 0.0011
Epoch 8/30

Epoch 8: ReduceLROnPlateau reducing learning rate to 0.0005321500357240438.
328/328 - 34s - 104ms/step - accuracy: 0.9224 - loss: 0.4185 - val_accuracy: 0.5068 - val_loss: 2.0721 - learning_rate: 0.0011
Epoch 9/30
328/328 - 34s - 102ms/step - accuracy: 0.9555 - loss: 0.3108 - val_accuracy: 0.8077 - val_loss: 0.7781 - learning_rate: 5.3215e-04
Epoch 10/30
328/328 - 33s - 102ms/step - accuracy: 0.9617 - loss: 0.2853 - val_accuracy: 0.9359 - val_loss: 0.3511 - learning_rate: 5.3215e-04
Epoch 11/30
328/328 - 33s - 102ms/step - accuracy: 0.9654 - loss: 0.2697 - val_accuracy: 0.7932 - val_loss: 0.7650 - learning_rate: 5.3215e-04
Epoch 12/30
328/328 - 33s - 102ms/step - accuracy: 0.9667 - loss: 0.2645 - val_accuracy: 0.7636 - val_loss: 1.1191 - learning_rate: 5.3215e-04
Epoch 13/30

Epoch 13: ReduceLROnPlateau reducing learning rate to 0.0002660750178620219.
328/328 - 34s - 104ms/step - accuracy: 0.9646 - loss: 0.2598 - val_accuracy: 0.7914 - val_loss: 0.8200 - learning_rate: 5.3215e-04
Epoch 14/30
328/328 - 41s - 124ms/step - accuracy: 0.9844 - loss: 0.1980 - val_accuracy: 0.9832 - val_loss: 0.1972 - learning_rate: 2.6608e-04
Epoch 15/30
328/328 - 33s - 102ms/step - accuracy: 0.9877 - loss: 0.1828 - val_accuracy: 0.8805 - val_loss: 0.4814 - learning_rate: 2.6608e-04
Epoch 16/30
328/328 - 33s - 102ms/step - accuracy: 0.9881 - loss: 0.1785 - val_accuracy: 0.9068 - val_loss: 0.4690 - learning_rate: 2.6608e-04
Epoch 17/30

Epoch 17: ReduceLROnPlateau reducing learning rate to 0.00013303750893101096.
328/328 - 34s - 104ms/step - accuracy: 0.9873 - loss: 0.1747 - val_accuracy: 0.8141 - val_loss: 0.6541 - learning_rate: 2.6608e-04
Epoch 18/30
328/328 - 40s - 123ms/step - accuracy: 0.9933 - loss: 0.1534 - val_accuracy: 0.9805 - val_loss: 0.1749 - learning_rate: 1.3304e-04
Epoch 19/30
328/328 - 34s - 104ms/step - accuracy: 0.9951 - loss: 0.1447 - val_accuracy: 0.9914 - val_loss: 0.1501 - learning_rate: 1.3304e-04
Epoch 20/30
328/328 - 33s - 102ms/step - accuracy: 0.9943 - loss: 0.1421 - val_accuracy: 0.9850 - val_loss: 0.1710 - learning_rate: 1.3304e-04
Epoch 21/30
328/328 - 41s - 126ms/step - accuracy: 0.9957 - loss: 0.1374 - val_accuracy: 0.9705 - val_loss: 0.2098 - learning_rate: 1.3304e-04
Epoch 22/30

Epoch 22: ReduceLROnPlateau reducing learning rate to 6.651875446550548e-05.
328/328 - 33s - 102ms/step - accuracy: 0.9966 - loss: 0.1317 - val_accuracy: 0.9505 - val_loss: 0.2438 - learning_rate: 1.3304e-04
Epoch 23/30
328/328 - 33s - 102ms/step - accuracy: 0.9975 - loss: 0.1240 - val_accuracy: 0.9909 - val_loss: 0.1313 - learning_rate: 6.6519e-05
Epoch 24/30
328/328 - 34s - 102ms/step - accuracy: 0.9988 - loss: 0.1201 - val_accuracy: 0.9923 - val_loss: 0.1334 - learning_rate: 6.6519e-05
Epoch 25/30
328/328 - 34s - 102ms/step - accuracy: 0.9983 - loss: 0.1182 - val_accuracy: 0.9945 - val_loss: 0.1249 - learning_rate: 6.6519e-05
Epoch 26/30
328/328 - 33s - 102ms/step - accuracy: 0.9985 - loss: 0.1158 - val_accuracy: 0.9927 - val_loss: 0.1288 - learning_rate: 6.6519e-05
Epoch 27/30
328/328 - 33s - 102ms/step - accuracy: 0.9995 - loss: 0.1118 - val_accuracy: 0.9905 - val_loss: 0.1361 - learning_rate: 6.6519e-05
Epoch 28/30
328/328 - 34s - 104ms/step - accuracy: 0.9979 - loss: 0.1119 - val_accuracy: 0.9918 - val_loss: 0.1204 - learning_rate: 6.6519e-05
Epoch 29/30
328/328 - 41s - 124ms/step - accuracy: 0.9986 - loss: 0.1089 - val_accuracy: 0.9895 - val_loss: 0.1281 - learning_rate: 6.6519e-05
Epoch 30/30
328/328 - 33s - 102ms/step - accuracy: 0.9985 - loss: 0.1069 - val_accuracy: 0.9873 - val_loss: 0.1360 - learning_rate: 6.6519e-05
In [ ]:
plot_history(best_large_cnn_history)
No description has been provided for this image

Insights from Learning Curve:¶

  • Slower ramp-up:
    Low starting accuracy (~6%) and very high validation loss (~8) in epoch 1 shows your model "feels" the full-sized images are a harder learning problem than tiny 23x23 patches. It takes until roughly epoch 10 before both training and validation accuracy crack 80%.

  • Mid-training volatility:
    Big swings in validation loss (spikes at epochs 3, 6, 11, 16, 25) and corresponding dips in val accuracy.

  • Convergence and plateau:
    After epoch 20, both curves smooth out: validation accuracy edges into the high-90s and validation loss drops below 0.2. Indicates our model eventually "learns through" the extra variability. Also converges nicer at the later epochs, with the validation accuracy almost the same as the training accuracy.

Model Evaluation¶

In [ ]:
small_test = tf.keras.preprocessing.image_dataset_from_directory(
    "/content/Dataset for CA1 part A - AY2526S1/test",
    color_mode="grayscale",
    batch_size=32,
    image_size=(23,23),
    shuffle=True,
    seed=123
)

large_test = tf.keras.preprocessing.image_dataset_from_directory(
    "/content/Dataset for CA1 part A - AY2526S1/test",
    color_mode="grayscale",
    batch_size=32,
    image_size=(101, 101),
    shuffle=True,
    seed=123,
    labels='inferred',
    label_mode="int"
)

small_test = small_test.map(normalize_img)
large_test = large_test.map(normalize_img)
Found 2200 files belonging to 11 classes.
Found 2200 files belonging to 11 classes.
In [ ]:
# Evaluate model on the test dataset
small_cnn_loss, small_cnn_accuracy = best_small_cnn.evaluate(small_test)

print("CNN Test accuracy:", small_cnn_accuracy)
69/69 ━━━━━━━━━━━━━━━━━━━━ 1s 10ms/step - accuracy: 0.9611 - loss: 0.1566
CNN Test accuracy: 0.9595454335212708
In [ ]:
# Evaluate model on the test dataset
large_cnn_loss, large_cnn_accuracy = best_large_cnn.evaluate(large_test)

print("CNN Test accuracy:", large_cnn_accuracy)
69/69 ━━━━━━━━━━━━━━━━━━━━ 3s 36ms/step - accuracy: 0.9892 - loss: 0.1369
CNN Test accuracy: 0.9890909194946289

We can observe that the CNN trained and tested on 101x101 input performs better than the CNN trained on 23x23 input. We will further discuss this in our conclusion.

Model's Weights¶

We reload the model's weights to confirm their functionality and also to ease reproducibility in the following sections.

In [ ]:
best_small_cnn = build_small_custom_cnn_best()
best_small_cnn.load_weights('/content/drive/MyDrive/Datasets/best_small_cnn1.weights.h5')

test_loss, test_accuracy = best_small_cnn.evaluate(small_test)
print(f"Test accuracy after loading weights: {test_accuracy:.4f}")
/usr/local/lib/python3.11/dist-packages/keras/src/saving/saving_lib.py:757: UserWarning: Skipping variable loading for optimizer 'adamax', because it has 2 variables whereas the saved optimizer has 62 variables. 
  saveable.load_own_variables(weights_store.get(inner_path))
69/69 ━━━━━━━━━━━━━━━━━━━━ 6s 32ms/step - accuracy: 0.9588 - loss: 0.1557
Test accuracy after loading weights: 0.9600
In [ ]:
best_large_cnn = build_best_large_custom_cnn()
best_large_cnn.load_weights('/content/drive/MyDrive/Datasets/best_large_cnn_3.weights.h5')

test_loss, test_accuracy = best_large_cnn.evaluate(large_test)
print(f"Test accuracy after loading weights: {test_accuracy:.4f}")
/usr/local/lib/python3.11/dist-packages/keras/src/saving/saving_lib.py:757: UserWarning: Skipping variable loading for optimizer 'adam', because it has 2 variables whereas the saved optimizer has 62 variables. 
  saveable.load_own_variables(weights_store.get(inner_path))
69/69 ━━━━━━━━━━━━━━━━━━━━ 6s 54ms/step - accuracy: 0.9894 - loss: 0.1405
Test accuracy after loading weights: 0.9895

Model Metrics¶

Confusion matrices, and model layers will be visualized in this section.

In [ ]:
def plot_confusion_matrix(name, model, test_dataset, class_names):
    # Get true labels and predictions
    y_true = []
    y_pred = []

    for X_batch, y_batch in test_dataset:
        # Check if y_batch is one-hot encoded or integer labels
        if len(y_batch.shape) == 2:  # One-hot encoded labels (batch_size, num_classes)
            y_true.extend(np.argmax(y_batch.numpy(), axis=1))
        else:  # Integer labels (batch_size,)
            y_true.extend(y_batch.numpy())

        # Get predictions
        y_pred_probs = model.predict(X_batch, verbose=0)
        y_pred.extend(np.argmax(y_pred_probs, axis=1))  # Predicted class indices

    y_true = np.array(y_true)
    y_pred = np.array(y_pred)

    # Compute the confusion matrix
    cm = confusion_matrix(y_true, y_pred)
    disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=class_names)

    # Plot the confusion matrix
    fig, ax = plt.subplots(figsize=(12, 10))
    disp.plot(cmap=plt.cm.Blues, ax=ax, xticks_rotation=90)
    plt.title(f"Confusion Matrix: {name}", fontsize=16)
    plt.tight_layout()
    plt.show()

    # Class-wise Accuracy
    print("Class-wise Accuracy:\n")
    total_per_class = cm.sum(axis=1)
    correct_per_class = np.diag(cm)
    for i, class_name in enumerate(class_names):
        acc = correct_per_class[i] / total_per_class[i]
        print(f"{class_name:30}: {acc:.2%}")

    # Overall Accuracy
    overall_acc = np.sum(correct_per_class) / np.sum(cm)
    print(f"\nOverall Accuracy: {overall_acc:.2%}")

    print('-'*100)

Confusion Matrix for the 23x23 Resnet model¶

In [ ]:
plot_confusion_matrix("23x23 CNN Model", best_small_cnn, small_test, class_names=class_names)
No description has been provided for this image
Class-wise Accuracy:

Capsicum                      : 98.00%
Tomato                        : 95.00%
Bitter_Gourd                  : 95.00%
Pumpkin                       : 93.50%
Bean                          : 98.50%
Brinjal                       : 93.00%
Cabbage                       : 99.50%
Cucumber and Bottle_Gourd     : 95.50%
Radish and Carrot             : 94.50%
Potato                        : 98.50%
Cauliflower and Broccoli      : 95.00%

Overall Accuracy: 96.00%
----------------------------------------------------------------------------------------------------
In [ ]:
# Collect true and predicted labels
y_true = []
y_pred = []

for images, labels in small_test:
    predictions = best_small_cnn.predict(images, verbose=0)
    predicted_labels = np.argmax(predictions, axis=1)
    y_true.extend(labels.numpy())
    y_pred.extend(predicted_labels)

# Generate classification report
report = classification_report(y_true, y_pred, target_names=class_names)
print("Classification Report for the 23x23 Model:\n")
print(report)
Classification Report:

                           precision    recall  f1-score   support

                     Bean       0.96      0.98      0.97       200
             Bitter_Gourd       0.98      0.95      0.97       200
                  Brinjal       0.95      0.95      0.95       200
                  Cabbage       0.96      0.94      0.95       200
                 Capsicum       0.99      0.98      0.99       200
 Cauliflower and Broccoli       0.90      0.93      0.92       200
Cucumber and Bottle_Gourd       0.95      0.99      0.97       200
                   Potato       0.97      0.95      0.96       200
                  Pumpkin       0.98      0.94      0.96       200
        Radish and Carrot       0.98      0.98      0.98       200
                   Tomato       0.94      0.95      0.95       200

                 accuracy                           0.96      2200
                macro avg       0.96      0.96      0.96      2200
             weighted avg       0.96      0.96      0.96      2200

Confusion Matrix for the 101x101 best model¶

In [ ]:
plot_confusion_matrix("101x101 CNN Model", best_large_cnn, large_test, class_names=class_names)
No description has been provided for this image
Class-wise Accuracy:

Capsicum                      : 99.50%
Tomato                        : 99.00%
Bitter_Gourd                  : 97.50%
Pumpkin                       : 99.00%
Bean                          : 99.50%
Brinjal                       : 98.00%
Cabbage                       : 100.00%
Cucumber and Bottle_Gourd     : 98.00%
Radish and Carrot             : 99.50%
Potato                        : 99.50%
Cauliflower and Broccoli      : 99.00%

Overall Accuracy: 98.95%
----------------------------------------------------------------------------------------------------
In [ ]:
# Collect true and predicted labels
y_true = []
y_pred = []

for images, labels in large_test:
    predictions = best_large_cnn.predict(images, verbose=0)
    predicted_labels = np.argmax(predictions, axis=1)
    y_true.extend(labels.numpy())
    y_pred.extend(predicted_labels)

# Generate classification report
report = classification_report(y_true, y_pred, target_names=class_names)
print("Classification Report for the 101x101 Model:\n")
print(report)
Classification Report for the 101x101 Model:

                           precision    recall  f1-score   support

                     Bean       0.99      0.99      0.99       200
             Bitter_Gourd       1.00      0.99      0.99       200
                  Brinjal       0.98      0.97      0.98       200
                  Cabbage       0.98      0.99      0.98       200
                 Capsicum       1.00      0.99      1.00       200
 Cauliflower and Broccoli       0.98      0.98      0.98       200
Cucumber and Bottle_Gourd       0.97      1.00      0.98       200
                   Potato       1.00      0.98      0.99       200
                  Pumpkin       0.99      0.99      0.99       200
        Radish and Carrot       1.00      0.99      1.00       200
                   Tomato       0.99      0.99      0.99       200

                 accuracy                           0.99      2200
                macro avg       0.99      0.99      0.99      2200
             weighted avg       0.99      0.99      0.99      2200

Insights from the Confusion Matrices¶

23x23 model: We see quite a few off-diagonal cells (e.g. Brinjal -> Capsicum, Pumpkin -> Brinjal, Cauliflower&Broccoli -> Potato, etc.). Overall accuracy is high, but there's still noticeable “bleed” between visually similar classes.

101x101 model: The vast majority of predictions land on the diagonal. Only a handful of mistakes remain (e.g. Bitter Gourd -> Bottle Gourd, Cauliflower&Broccoli -> Potato, Brinjal -> Pumpkin), and even those are mostly just 1-3 images per class.

Insights: More resolution preserves fine-grained textures and shape cues, letting the network disambiguate pumpkins and capsicums for instance, much more reliably.

On small inputs, any shapes or textures that overlap between “pairs” of vegetables (e.g. the rough rind of both pumpkin and bitter gourd) become easy to confuse.

On larger inputs, those same cues (vein patterns, stalk shapes, surface texture) become distinct again, so the model almost never mistakes one for the other.



What this tells us:
Spatial detail matters: At 23x23, we're sometimes down to a handful of pixels for a leaf edge or color gradient; at 101x101, those features are much richer.

Error Analysis¶

Here, we will view the images that our model got wrong for analysis.

In [ ]:
def error_analyze(class_names, model, test):
    # 1. Extract all test images and labels
    X_test = []
    y_test = []

    for images, labels in test:
        X_test.extend(images.numpy())   # convert to NumPy
        y_test.extend(labels.numpy())

    X_test = np.array(X_test)
    y_test = np.array(y_test)

    # 2. Make predictions
    y_pred_probs = model.predict(X_test)
    y_pred_classes = np.argmax(y_pred_probs, axis=1)

    # 3. Identify misclassified indices
    wrong_indices = np.where(y_pred_classes != y_test)[0]

    # 4. Show top-2 predictions for some wrong predictions
    N = 5
    rows = []

    for idx in wrong_indices[:N]:
        probs = y_pred_probs[idx]
        top2 = probs.argsort()[-2:][::-1]  # descending top 2

        rows.append({
            'Index': idx,
            'Actual Label': y_test[idx],
            'Actual Class': class_names[y_test[idx]],
            'Predicted Label': y_pred_classes[idx],
            'Predicted Class': class_names[y_pred_classes[idx]],
            'Top-1 Class': class_names[top2[0]],
            'Top-1 Prob': probs[top2[0]],
            'Top-2 Class': class_names[top2[1]],
            'Top-2 Prob': probs[top2[1]],
        })

    top_errors = pd.DataFrame(rows)
    display(top_errors)
    return (X_test, y_test, rows)


class_names = sorted(os.listdir("/content/Dataset for CA1 part A - AY2526S1/train"))

Error Analysis for 23x23 Model¶

In [ ]:
X_test_small, y_test_small, rows_small = error_analyze(class_names, best_small_cnn, small_test)
69/69 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
Index Actual Label Actual Class Predicted Label Predicted Class Top-1 Class Top-1 Prob Top-2 Class Top-2 Prob
0 91 1 Bitter_Gourd 3 Cabbage Cabbage 0.805765 Bitter_Gourd 0.126022
1 101 4 Capsicum 3 Cabbage Cabbage 0.513060 Capsicum 0.386378
2 106 5 Cauliflower and Broccoli 6 Cucumber and Bottle_Gourd Cucumber and Bottle_Gourd 0.666534 Cauliflower and Broccoli 0.267308
3 143 2 Brinjal 3 Cabbage Cabbage 0.663792 Brinjal 0.130903
4 145 10 Tomato 9 Radish and Carrot Radish and Carrot 0.605660 Cauliflower and Broccoli 0.115429
In [ ]:
N = len(rows_small)
cols = min(N, 5)  # up to 5 images per row
rows_needed = (N + cols - 1) // cols

fig, axes = plt.subplots(rows_needed, cols, figsize=(cols * 3, rows_needed * 3))

if rows_needed == 1:
    axes = np.expand_dims(axes, axis=0)  # make 2D if only 1 row

for i, row in enumerate(rows_small):
    r, c = divmod(i, cols)
    ax = axes[r][c]

    img = X_test_small[row['Index']].squeeze()
    ax.imshow(img, cmap='gray')

    actual = class_names[row['Actual Label']]
    pred = class_names[row['Predicted Label']]
    top1 = row['Top-1 Class']
    top2 = row['Top-2 Class']

    ax.set_title(f"True: {actual}\nPred: {pred}\n1: {top1} ({row['Top-1 Prob']:.2f})\n2: {top2} ({row['Top-2 Prob']:.2f})", fontsize=8)
    ax.axis('off')

# Hide any unused subplots
for i in range(N, rows_needed * cols):
    fig.delaxes(axes.flatten()[i])

plt.tight_layout()
plt.show()
No description has been provided for this image

First image: We can see that the images's true label is a bitter gourd, however, the model was 83% sure it was cabbage, and 13% sure it was bitter gourd. To the human eye, it is understandable why the model thinks so. The circular shape with the white center shows features similar to that of a cabbage, hence it is a reasonable prediction. Futhermore, the model was able to slightly guess that it was a bitter gourd, with a 13% confidence.

Second image: The model closely predicts between Cabbage and Capsicum. They both tend to have the same round and circular shape, and it is a reasonable prediction.

Third image: To the human eye, it is completely indistunguishable. There are no specific shapes to hint that this is a Cauliflower and Broccoli. It is impressive that the model is able to guess the correct class with a 27% confidence. This shows that our model has potential.

Fourth image: The brinjals in the dataset have different shapes and sizes. For this brinjal, we can see that it follows a circular, smooth round shape. It is a reasonable prediction of our model to predict Cabbage. Furthermore, it was able to guess brinjal with a 13% confidence rate. It is somewhat impressive.

Lastly, the last image was a Tomato. However, it is shaped like a Radish. Even I would have guessed a Radish or Carrot. Therefore it is a reasonable guess in my opinion.

Error Analysis for 101x101 Model¶

In [ ]:
X_test_large, y_test_large, rows_large = error_analyze(class_names, best_large_cnn, large_test)
69/69 ━━━━━━━━━━━━━━━━━━━━ 3s 36ms/step
Index Actual Label Actual Class Predicted Label Predicted Class Top-1 Class Top-1 Prob Top-2 Class Top-2 Prob
0 164 1 Bitter_Gourd 3 Cabbage Cabbage 0.918178 Bitter_Gourd 0.024346
1 233 3 Cabbage 6 Cucumber and Bottle_Gourd Cucumber and Bottle_Gourd 0.909914 Cabbage 0.085638
2 298 7 Potato 6 Cucumber and Bottle_Gourd Cucumber and Bottle_Gourd 0.490984 Potato 0.203611
3 361 5 Cauliflower and Broccoli 3 Cabbage Cabbage 0.804141 Cauliflower and Broccoli 0.165774
4 363 8 Pumpkin 5 Cauliflower and Broccoli Cauliflower and Broccoli 0.818096 Pumpkin 0.122512
In [ ]:
N = len(rows_large)
cols = min(N, 5)  # up to 5 images per row
rows_needed = (N + cols - 1) // cols

fig, axes = plt.subplots(rows_needed, cols, figsize=(cols * 3, rows_needed * 3))

if rows_needed == 1:
    axes = np.expand_dims(axes, axis=0)  # make 2D if only 1 row

for i, row in enumerate(rows_large):
    r, c = divmod(i, cols)
    ax = axes[r][c]

    img = X_test_large[row['Index']].squeeze()
    ax.imshow(img, cmap='gray')

    actual = class_names[row['Actual Label']]
    pred = class_names[row['Predicted Label']]
    top1 = row['Top-1 Class']
    top2 = row['Top-2 Class']

    ax.set_title(f"True: {actual}\nPred: {pred}\n1: {top1} ({row['Top-1 Prob']:.2f})\n2: {top2} ({row['Top-2 Prob']:.2f})", fontsize=8)
    ax.axis('off')

# Hide any unused subplots
for i in range(N, rows_needed * cols):
    fig.delaxes(axes.flatten()[i])

plt.tight_layout()
plt.show()
No description has been provided for this image

First image: The basket holding the bitter gourds has a similar texture to cabbages. This may introduce noise to our model, which would cause it to predict unexpectedly. It is a justificable error to predict cabbage.

Second image: The model predicts a cucumber or bottle gourd, however, it is clearly a cabbage. Hence, this is a unreasonable error, and the model predicted this one poorly.

Third image: The potatos and cucumbers and bottle gourds carry similar structure, and our model might end up predicting either class wrongly. Hence, it is a reasonable error.

Fourth image: We can see a cauliflower, however, there is unnecessary noise in the background, which can lead to poor model predictions. Hence, it is a reasonable error.

Lastly, even though we can clearly see a pumpkin, we can observe a certain texture in the background which can lead to possible predictions such as cauliflower due to similar texture. Hence, this is a reasonable error.

Model Architecture¶

In [ ]:
tf.keras.utils.plot_model(best_small_cnn, show_shapes=True, show_layer_names=True, dpi=70, to_file='small_model_architecture.png')
Out[ ]:
No description has been provided for this image
In [ ]:
# Use a smaller font for better layout
try:
    font = ImageFont.truetype("arial.ttf", 12)
except:
    font = None

# Render the layered view and display it
image = visualkeras.layered_view(best_small_cnn, legend=True, draw_volume=True, font=font)
display(image)
/usr/local/lib/python3.11/dist-packages/visualkeras/layered.py:86: UserWarning: The legend_text_spacing_offset parameter is deprecated and will be removed in a future release.
  warnings.warn("The legend_text_spacing_offset parameter is deprecated and will be removed in a future release.")
No description has been provided for this image
In [ ]:
tf.keras.utils.plot_model(best_large_cnn, show_shapes=True, show_layer_names=True, dpi=70, to_file='large_model_architecture.png')
Out[ ]:
No description has been provided for this image
In [ ]:
# Use a smaller font for better layout
try:
    font = ImageFont.truetype("arial.ttf", 12)
except:
    font = None

# Render the layered view and display it
image = visualkeras.layered_view(best_large_cnn, legend=True, draw_volume=True, font=font)
display(image)
/usr/local/lib/python3.11/dist-packages/visualkeras/layered.py:86: UserWarning: The legend_text_spacing_offset parameter is deprecated and will be removed in a future release.
  warnings.warn("The legend_text_spacing_offset parameter is deprecated and will be removed in a future release.")
No description has been provided for this image

Comparing the classification accuracies of both 23x23 and 101x101 models.¶

We can observe that the model for the 23x23 images had an accuracy of about 95-96%, whereas the model for the 101x101 images had an accuracy of about 99%.

*(may vary a bit due to rerunning the code)

Why does the model trained and tested on 101x101 images perform better?¶

  • Higher Spatial Resolution.

Higher spatial resolution typically means more informative features. Fine patterns, edges, or textures that help distinguish between classes may get lost during aggressive downscaling.

At 23x23, a single convolution kernel might cover a large proportion of the object, reducing the model's ability to detect localized, discriminative features.


  • Better Generalization

When the input is rich in detail, the model learns more generalizable and discriminative features, leading to improved accuracy on unseen data.

23x23 inputs might lead the model to memorize coarse features (e.g., object shape) but miss nuances (e.g., texture, borders).

Insights¶




Model trained on 23x23 Images¶

  • Advantages

The model trained with 23x23 pixel images exhibits a significantly faster training time compared to the model trained with 101x101 images. This is largely due to the smaller input size, which requires fewer computations for both forward and backward passes during training.


This makes the 23x23 model a more attractive option for scenarios where quick model iteration is necessary such as real-time computations. Moreover, the smaller model size also implies that the 23x23 resolution would be more suitable for deployment on edge devices, such as mobile phones or IoT devices, which often have limited processing power and memory.

  • Disadvantages

However, while the 23x23 model performs quite efficiently in terms of speed, there is a noticeable trade-off in accuracy. The 95% accuracy is respectable, but it comes at the cost of losing fine-grained information available in higher resolution images.


The relatively lower accuracy may not be a significant issue in many real-time applications, such as face recognition, object detection in low-resource environments, or simple classification tasks where a small error margin is acceptable. However, for more sensitive applications, the decrease in accuracy could lead to undesirable outcomes.




Model Trained on 101x101 Images¶

  • Advantages

On the other hand, the model trained with 101x101 images achieves an accuracy of 99%, which is clearly superior in terms of predictive power. The increased resolution allows the model to capture more fine-grained details, which can be crucial in domains requiring high precision.


For example, in medical AI applications, such as cancer detection or diagnostic imaging, the ability to discern subtle patterns or abnormalities in high-resolution images can be the difference between an accurate diagnosis and a false one. In such cases, sacrificing accuracy for speed or resource efficiency would not be acceptable.

  • Disadvantages

The trade-off, however, is that the 101x101 model requires considerably more computational resources, both in terms of memory and processing power. Larger input sizes increase the complexity of the model, leading to longer training times and higher resource consumption during inference. For deployment on devices with limited computational capabilities, this could pose significant challenges.


Additionally, the increased memory requirements for higher-resolution images could limit the scalability of the model when dealing with large datasets.




To summarize:¶


The choice between 23x23 and 101x101 image resolutions is a balancing act between speed and accuracy.


For real-time applications and deployment on edge devices, the 23x23 resolution may be more practical despite the slight accuracy loss.


Conversely, when high precision is crucial, such as in medical AI or high-stakes industrial applications, the 101x101 resolution offers superior accuracy, at the cost of increased computational demand.



Ultimately, the decision on which model to deploy depends on the specific requirements of the task at hand, the computational resources available, and the acceptable margin of error for the application. Further experimentation, including the use of intermediate resolutions or techniques such as transfer learning, may help mitigate some of these trade-offs.